Possible delay in Quantum GRE and flows

Asked by Marco Colombo

Hi All,
i have a brand new install of Grizzly on Ubuntu 12.04 with network_type gre
I have high RTT for the first packets when i'm trying to ping the VM
My VM has a Private IP: 192.168.178.2 natted on Public IP: 185.21.172.18
When i try to reach public IP, i get this output :

PING 185.21.172.18 (185.21.172.18) 56(84) bytes of data.
64 bytes from 185.21.172.18: icmp_req=1 ttl=58 time=6546 ms
64 bytes from 185.21.172.18: icmp_req=2 ttl=58 time=5546 ms
64 bytes from 185.21.172.18: icmp_req=3 ttl=58 time=4546 ms
64 bytes from 185.21.172.18: icmp_req=4 ttl=58 time=3546 ms
64 bytes from 185.21.172.18: icmp_req=5 ttl=58 time=2546 ms
64 bytes from 185.21.172.18: icmp_req=6 ttl=58 time=1546 ms
64 bytes from 185.21.172.18: icmp_req=7 ttl=58 time=546 ms
64 bytes from 185.21.172.18: icmp_req=8 ttl=58 time=2.52 ms
64 bytes from 185.21.172.18: icmp_req=9 ttl=58 time=2.50 ms
64 bytes from 185.21.172.18: icmp_req=10 ttl=58 time=2.73 ms
64 bytes from 185.21.172.18: icmp_req=11 ttl=58 time=2.41 ms
64 bytes from 185.21.172.18: icmp_req=12 ttl=58 time=2.67 ms

Seems that the flows are create with some delay.
The output of the command : ovs-dpctl dump-flows br-int | grep 212.29.130
as soon as i start the ping is blank
when ping start work the flows are created succesfull

ovs-dpctl dump-flows br-int | grep 212.29.130

in_port(80),eth(src=fa:16:3e:99:1b:4c,dst=fa:16:3e:da:15:4c),eth_type(0x0800),ipv4(src=212.29.130.12,dst=192.168.178.2,proto=1,tos=0,ttl=58,frag=no),icmp(type=8,code=0), packets:7, bytes:686, used:0.276s, actions:push_vlan(vid=11,pcp=0),65

in_port(65),eth(src=fa:16:3e:da:15:4c,dst=fa:16:3e:99:1b:4c),eth_type(0x8100),vlan(vid=11,pcp=0),encap(eth_type(0x0800),ipv4(src=192.168.178.2,dst=212.29.130.12,proto=1,tos=0,ttl=64,frag=no),icmp(type=0,code=0)), packets:1, bytes:98, used:0.276s, actions:pop_vlan,80

if i try to ping again the VM (when the flow is created) ping works properly, otherwise latency problems occurs again.

Am I doing something wrong?Can anyone help with this?
if you need other information, don't hesitate to ask
Thanks

Question information

Language:
English Edit question
Status:
Solved
For:
neutron Edit question
Assignee:
No assignee Edit question
Solved by:
Marco Colombo
Solved:
Last query:
Last reply:
Revision history for this message
Darragh O'Reilly (darragh-oreilly) said :
#1

just to confirm, the 1st ping request packet is reaching br-int quickly - ie it is not getting slowed down outside openstack/quantum and taking 6.5 seconds to reach the quantum router gateway?

Revision history for this message
Marco Colombo (colo90) said :
#2

Hi Darragh, thanks for reply.
no, the quantum router gateway are reached quickly, see this log. Tcpdump are made on the private interface of the quantum router.

09:35:59.143451 IP 212.29.130.12 > 192.168.178.2: ICMP echo request, id 15800, seq 1, length 64
09:35:59.836781 IP 212.29.130.12 > 192.168.178.2: ICMP echo request, id 15800, seq 2, length 64
09:36:00.844828 IP 212.29.130.12 > 192.168.178.2: ICMP echo request, id 15800, seq 3, length 64
09:36:01.852905 IP 212.29.130.12 > 192.168.178.2: ICMP echo request, id 15800, seq 4, length 64
09:36:02.355638 IP 192.168.178.2 > 212.29.130.12: ICMP echo reply, id 15800, seq 1, length 64
09:36:02.355710 IP 192.168.178.2 > 212.29.130.12: ICMP echo reply, id 15800, seq 2, length 64
09:36:02.355731 IP 192.168.178.2 > 212.29.130.12: ICMP echo reply, id 15800, seq 3, length 64
09:36:02.355760 IP 192.168.178.2 > 212.29.130.12: ICMP echo reply, id 15800, seq 4, length 64
09:36:02.854375 IP 212.29.130.12 > 192.168.178.2: ICMP echo request, id 15800, seq 5, length 64
09:36:02.855173 IP 192.168.178.2 > 212.29.130.12: ICMP echo reply, id 15800, seq 5, length 64
09:36:03.856016 IP 212.29.130.12 > 192.168.178.2: ICMP echo request, id 15800, seq 6, length 64
09:36:03.856978 IP 192.168.178.2 > 212.29.130.12: ICMP echo reply, id 15800, seq 6, length 64

and this is the ping output :

64 bytes from 185.21.172.18: icmp_req=1 ttl=58 time=3527 ms
64 bytes from 185.21.172.18: icmp_req=2 ttl=58 time=2520 ms
64 bytes from 185.21.172.18: icmp_req=3 ttl=58 time=1512 ms
64 bytes from 185.21.172.18: icmp_req=4 ttl=58 time=504 ms
64 bytes from 185.21.172.18: icmp_req=5 ttl=58 time=2.56 ms
64 bytes from 185.21.172.18: icmp_req=6 ttl=58 time=2.69 ms
64 bytes from 185.21.172.18: icmp_req=7 ttl=58 time=3.27 ms
64 bytes from 185.21.172.18: icmp_req=8 ttl=58 time=2.47 ms
64 bytes from 185.21.172.18: icmp_req=9 ttl=58 time=2.51 ms

The reply packet flows (on the quantum server) are created 1-2 seconds(or more) later i started to ping VM and so we see the reply ping deleted.
I think that this is my problem, but i don't know how to solve

Thanks

Revision history for this message
Darragh O'Reilly (darragh-oreilly) said :
#3

Hi Marco,

ok - just wanted to check that the delay was within ovs.

I think the creation of new flows requires a context switch to vswitchd in userspace. Does the compute node have lots of free memeory, or is it doing a lot of swapping? I see the port numbers are quite high (65 and 80) - how many instances is it currently running? Maybe check the vswitchd logs too.

Darragh.

Revision history for this message
Marco Colombo (colo90) said :
#4

Hi Darragh,
in this moment there are 16 instances.
The load avarage is 4 (is too high?) and the free memory is about 1,5 GB.
This is the ovs-vswitchd.log.

May 06 09:51:13|82858|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on tap569e45aa-1c device failed: No such device
May 06 09:51:20|82859|netdev|WARN|Dropped 423 log messages in last 12 seconds (most recently, 1 seconds ago) due to excessive rate
May 06 09:51:20|82860|netdev|WARN|failed to get flags for network device tap4540e0ad-99: No such device
May 06 09:51:22|82861|netdev_linux|WARN|Dropped 119 log messages in last 9 seconds (most recently, 5 seconds ago) due to excessive rate
May 06 09:51:22|82862|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on tap569e45aa-1c device failed: No such device
May 06 09:51:32|82863|netdev|WARN|Dropped 633 log messages in last 12 seconds (most recently, 1 seconds ago) due to excessive rate
May 06 09:51:32|82864|netdev|WARN|failed to get flags for network device tap6364e554-7f: No such device
May 06 09:51:36|82865|netdev_linux|WARN|Dropped 239 log messages in last 15 seconds (most recently, 6 seconds ago) due to excessive rate
May 06 09:51:36|82866|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on tap569e45aa-1c device failed: No such device
May 06 09:51:44|82867|netdev|WARN|Dropped 416 log messages in last 12 seconds (most recently, 1 seconds ago) due to excessive rate
May 06 09:51:44|82868|netdev|WARN|failed to get flags for network device tapa625aa89-fd: No such device
May 06 09:51:46|82869|netdev_linux|WARN|Dropped 119 log messages in last 9 seconds (most recently, 6 seconds ago) due to excessive rate
May 06 09:51:46|82870|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on tap569e45aa-1c device failed: No such device

and this is the output of the command ovs-ofctl show br-int

OFPT_FEATURES_REPLY (xid=0x1): ver:0x1, dpid:0000ba56c69ab54d
n_tables:255, n_buffers:256
features: capabilities:0xc7, actions:0xfff
 1(patch-tun): addr:ba:9d:bd:68:35:92
     config: 0
     state: 0
 2(qr-21d03c13-85): addr:3c:11:ff:7f:00:00
     config: PORT_DOWN
     state: LINK_DOWN
 3(qr-2e9d596d-3b): addr:3c:11:ff:7f:00:00
     config: PORT_DOWN
     state: LINK_DOWN
 4(qr-c7c253a7-02): addr:3c:11:ff:7f:00:00
     config: PORT_DOWN
     state: LINK_DOWN
 5(qr-7e531375-39): addr:00:87:00:00:fa:16
     config: PORT_DOWN
     state: LINK_DOWN
 6(qr-83b0e978-a7): addr:3c:11:ff:7f:00:00
     config: PORT_DOWN
     state: LINK_DOWN
 7(qr-32b06b50-91): addr:fa:16:3e:2f:6a:91
     config: PORT_DOWN
     state: LINK_DOWN
 8(qr-da54554e-20): addr:fa:16:3e:6a:cd:80
     config: PORT_DOWN
     state: LINK_DOWN
 125(tap569e45aa-1c): addr:3c:11:ff:7f:00:00
     config: PORT_DOWN
     state: LINK_DOWN

All my ports are down. I already try with ovs-ofctl mod-port br-int <port> up but does not work.
May be the problem?

Thanks

Revision history for this message
Darragh O'Reilly (darragh-oreilly) said :
#5

It looks like vswitchd has a problem with tap569e45aa-1c - does it exist in linux? I guess this is from the network node as there are only qr-* devices and tap569e45aa-1c which I guess is for dhcp?

Revision history for this message
Marco Colombo (colo90) said :
#6

Hi Darragh,
in linux does not exist, but there is in ovs.

Thanks

Revision history for this message
Darragh O'Reilly (darragh-oreilly) said :
#7

are you using ip namespaces? If so you will need to find the namespace names with:

# ip netns

and then list their interfaces with:

# ip netns exec {name-space} ip link

Revision history for this message
Marco Colombo (colo90) said :
#8

yes, i'm using namespaces.

ip netns exec qdhcp-7d4651b5-3030-4a54-a1e6-84145e77c4c4 ip link
49: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
1137: tap569e45aa-1c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether fa:16:3e:d7:9e:1a brd ff:ff:ff:ff:ff:ff

Thanks

Revision history for this message
Darragh O'Reilly (darragh-oreilly) said :
#9

Have you pinpointed the delay to be on the compute node or the network node? Are you working from any particular guide? I assume this is the ovs plugin?

Revision history for this message
Marco Colombo (colo90) said :
#10

Hi Darragh,
thanks for support. I change my quantum server with CPU improvment and now the problem has been solved.
Now the load avarage on the new server lower than older.