routing on different interfaces on the network node

Asked by Alexandre Bécholey

Hello,

We have successfully built a 10 nodes Grizzly Openstack (4 computes and 6 storages providing the differents services). Our VMs can access outside through the Network node running quantum with OpenvSwitch plugin.

Here a list of the services running on the network node:
- quantum-server
- quantum-dhcp-agent
- quantum-l3-agent
- quantum-metadata-agent
- quantum-plugin-openvswitch-agent
- openvswitch-switch

On the compute nodes:
- quantum-plugin-openvswitch-agent
- openvswitch-switch

Each node has:
- 1 Gb interface for server management (say eth0 on 10.0.0.0/24)
- an Infiniband adapter (IPoIB) for the "Openstack traffic", we don't use the mellanox plugin (yet?) (say ib0 on 172.16.0.0/24)
- an other Infiniband adapter for storage (we use Ceph as a backend storage for glance and cinder) (say ib1 on 172.16.1.0/24)

One dedicated 1Gb link is used for the VMs on the network node (eth1 in the bridge br-ex 10.10.0.0/24).

We can assign floating IP to VMs, ssh to them,... The VMs can reach the outside through the dedicated link on the network node.

How ever, we need to reach the storage network (172.16.1.0/24 on ib1) from the VMs.

If I try to ping the storage network, I see all traffic going through eth1 and hitting the default gateway.

We have setup some rules:
# ip rule
0: from all lookup local
1000: from 10.10.0.0/24 lookup vm
32766: from all lookup main
32767: from all lookup default

# ip route list table vm
default via 10.10.0.1 dev br-ex

If we add a route for 172.16.1.0/24 in that table (or in the default one), the packets are still sent to the default gateway. We also tried to add routes on the virtual router attached to the public network (10.10.0.0/24) and on the subnet without success.

So my question is: How can we reach other networks attached to other interfaces of the network node?
Is it a problem of putting the right routes in right place or should we build a new public network (with a new L3 agent, which is less desirable)?
In the case that the above is impossible, as compute nodes are also on the storage network, can we simply add a virtual interface on those nodes (as we would do on a standard hypervisor)? Could it be specified during the VM creation?

You may ask why we don't create volumes and attached them to VMs, this is because we need a shared storage that can be accessible (rw) from multiple VMs for distributed computing.

Thanks for your answers, feel free to ask, for more information.

Question information

Language:
English Edit question
Status:
Answered
For:
neutron Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Salvatore Orlando (salvatore-orlando) said :
#1

Have you considered provider networks?
I was thinking that:
- If you can boot VMs with multiple NICs, each VM might have a 2nd NIC on a provider network mapped to the storage network, so you'll have direct access to it.
- If you don't want to change the VM layout for accessing the storage network, you can still try and map it to a provider network and then attach that network to the logical routers.

Even if you will not launch any VM on this provider network, you should still be able to reach the hosts already running on it.

Revision history for this message
Alexandre Bécholey (alexandre-becholey) said :
#2

thank you for your reply.

No I didn't, so I tried a flat provider network.

The main problem with Infiniband is that you can't put an interface in a bridge (std linux or ovs), because IP over Infiniband is used and not Ethernet over Inifniband. Therefore the layer 2 seems not to be the same between an ethernet if and an IB one.

I came up with the idea of creating a virtual ethernet interface with:
# ip link add type veth

It created me 2 interfaces (veth0 and veth1)

I created a bridge in ovs, added one of the veth into it and added an IP address (different than the storage network).
Everthing seems to be fine as the route are good and I can ping the bridge's IP (from an other host).

In ovs plugins settings (/etc/quantum/plugins/openvswitch/ovs_quantum_plugin.ini), I added the following lines:
network_vlan_ranges = ceph-net:4090:4095 # I put some dummy numbers as we use gre and no vlan
bridge_mappings = ceph-net:br-ceph

I had to do this on each compute nodes and the network node (every nodes running the ovs quantum agent), to avoid any errors such as:
ERROR [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Cannot provision flat network for net-id=... - no bridge for physical_network br-ceph

I created the network in quantum after rebooting the ovs agent:
quantum net-create ceph-net --tenant_id ... --provider:network_type flat --provider:physical_network br-ceph

I also shared it. I created the subnet in the range of the br-ceph's IP (no dhcp yet)

When I boot a VM with an interface on our "standard" network and one in ceph-net, I can access the VM as usual but if I set an IP address on the interface connected to ceph-net I can't ping anything (i.e. I dont' see any traffic on the br-ceph or the veth attached to it). I also don't see any new interface added to it.
On the network node:
http://pastebin.com/mmFU5UbF

On the compute node:
http://pastebin.com/kvgCv86Z

Am I missing something, for example an agent/plugin to add on the compute nodes?

Revision history for this message
Salvatore Orlando (salvatore-orlando) said :
#3

Can you at least see the logical neutron ports for the VM interfaces on ceph-net?
This will help us debugging the root cause of the issue.

Revision history for this message
Alexandre Bécholey (alexandre-becholey) said :
#4

Yeah I can see one port:
quantum port-show 5645e5d5-362a-498c-9864-b19db1034531
+-----------------+-------------------------------------------------------------------------------------+
| Field | Value |
+-----------------+-------------------------------------------------------------------------------------+
| admin_state_up | True |
| device_id | 38eac7dd-2a5b-4b26-9ea4-3ba9c63d0315 |
| device_owner | compute:None |
| fixed_ips | {"subnet_id": "f1225ed7-1560-4e0f-a70d-d66200c1a282", "ip_address": "172.17.1.102"} |
| id | 5645e5d5-362a-498c-9864-b19db1034531 |
| mac_address | fa:16:3e:00:68:69 |
| name | |
| network_id | bf16ed0b-ecd4-4992-b19c-7d1d2c7dc76c |
| security_groups | 9c6b116a-83e1-4c27-84bd-14a216f0a854 |
| status | ACTIVE |
| tenant_id | 080b784baeea487587786b160e2f30b5 |
+-----------------+-------------------------------------------------------------------------------------+

On Horizon:
Name
    None
ID
    5645e5d5-362a-498c-9864-b19db1034531
Network ID
    bf16ed0b-ecd4-4992-b19c-7d1d2c7dc76c
Project ID
    080b784baeea487587786b160e2f30b5
Fixed IP
    IP address: 172.17.1.102, Subnet ID f1225ed7-1560-4e0f-a70d-d66200c1a282
Mac Address
    fa:16:3e:00:68:69
Status
    ACTIVE
Admin State
    UP
Attached Device
    Device Owner: compute:None
    Device ID: 38eac7dd-2a5b-4b26-9ea4-3ba9c63d0315

The Network ID correspond to ceph-net

Revision history for this message
Salvatore Orlando (salvatore-orlando) said :
#5

If you can only see a single port for your VM then nova is not correctly interpreting your request to boot the VM with two NICs.
I don't think it's failing to create the second port; in that case you should indeed obtain a failure while booting the instance as networking provisioning is currently processed synchronously with the create server request.

I am not sure how this could be happening. Looking at your nova compute logs you should be able to see the requests made to the neutron server. From the nova api i log you should be able to see the request received to check the network interface spec in the json request body.

Revision history for this message
Alexandre Bécholey (alexandre-becholey) said :
#6

Sorry about the confusion... I meant I can only see a single port for ceph-net but the VM has indeed two ports, one on our standard network (which works, floating Ip & co) and the other one in ceph-net. In the VM I can see two interfaces eth0 and eth1.
eth0 is configured by dhcp form our standard network and I configure eth1 manually.

Revision history for this message
Salvatore Orlando (salvatore-orlando) said :
#7

that makes more sense, as we can narrow down the issue to the open vswitch agent.

When you posted your configuration above, I did not notice that ceph-net was an ovs bridge, and that you're trying to plug VIF directly into a OVS bridge different from br-int; the nova VIF drivers are unfortunately unable to do that.

The port directed to ceph-net indeed landed on br-int:

        Port "tap5645e5d5-36"
            tag: 2
            Interface "tap5645e5d5-36"

You already said that you are unable to plug directly the infiniband interface into ceph net.
But I reckon you should be able to create another veth pair between br-int and ceph-net ovs bridges.
That might work, but we might also start adding some serious latency.

You also mentioned that your vlan tags are dummy because you're using GRE. Can you clarify if you referred to the tenant network or if you're using GRE tunnels to reach into your storage network?

Revision history for this message
Alexandre Bécholey (alexandre-becholey) said :
#8

We followed to installation steps in the Openstack documentation to setup the network tunnels in OpenvSwitch. So we use GRE for the tenant network. Here is the configuration of the plugin on the network node (without the database settings):

[OVS]
tenant_network_type = gre

enable_tunneling = True
tunnel_id_ranges = 1:1000
local_ip = 172.16.0.23

# added for the provider network you talked about
network_vlan_ranges = ceph-net:4090:4095
bridge_mappings = ceph-net:br-ceph

[AGENT]
polling_interval = 2

[SECURITYGROUP]
firewall_driver = quantum.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver

Let's recap a little...
- Our Infiniband cards are dual ports, so we have 2 interfaces on our nodes: ib0 and ib1
- Openstack traffic (inter-services traffic as well as GRE tunnels created by OpenvSwitch) goes through one Infiniband network: the ib0 interface 172.16.0.0/24
- Ceph traffic goes onto another Infiniband network: ib1 172.16.1.0/24
- if I add ib1 in a bridge it doesn't work
- Glance and Cinder use Ceph, we boot the VM from a volume created from an image (so we can use copy-on-write in Ceph), but I don't think it's relevant to that problem

What we want to achieve is to have the fastest access from the VMs to the storage for MapReduce kind of processing (maybe using Savanna for hadoop). I thought that mounting CephFS directly on the VM might be the solution. I really like your idea of creating a provider network.

Do you think that the steps I took to create the provider network are OK?

Revision history for this message
Alexandre Bécholey (alexandre-becholey) said :
#9

Do you need more information like configuration files or logs?

Revision history for this message
Taurus Cheung (taurus-cheung) said :
#10

Hi Alexandre,

My use case is similar as you. I want the VMs able to access the storage network on other interfaces of network node, not through br-ex. Have you found the solution?

Regards,
Taurus

Revision history for this message
Alexandre Bécholey (alexandre-becholey) said :
#11

Hi Tcheung,

No, still no solution...

Regards,

Alexandre

Revision history for this message
Dirk Grunwald (grunwald) said :
#12

Alexandre - I have a similar set up except with only one IB interface and I'm also using GRE interfaces.

I've been trying to get the nova/quantum setup to use IB as the transport for the gre
tunnels. It seems to do that (according to tcpdump), but the bandwidth I'm getting is
about 1gbit/s (bare-metal use of the ib interface is ~9gb/s). Do you get better performance
than that between VM's?

I'm also curious how you got boot-from-volume working with Ceph/RBD and libvirt.

With Ubuntu 13.10 I seem to suffer from two problems with Ceph - one is a libvirt
race condition that causes snapshots to fail ( https://bugs.launchpad.net/nova/+bug/1244694 )
and another is that at least in grizzly, I can't use cinder to make a volume from
anything that isn't a RAW image (minor annoyance).

Are you using Ubuntu or some other base distro?

Revision history for this message
Marcantonio (marcantonio) said :
#13

Did you ever get this working? I'm trying to do something similar. My compute nodes have NICs on a storage network. I want to mount an NFS share directly to my VMs over that network.

Revision history for this message
Gaurav Goyal (gaurav.goyal) said :
#14

Dear all,

Do you have solution to this problem. I am dealing with similar situation.

Can you help with this problem?

Provide an answer of your own, or ask Alexandre Bécholey for more information if necessary.

To post a message you must log in.