VM can't get IP address when running a multinodes deployment with GRE tunneling

Asked by Lingfeng Xiong

Operating System: Ubuntu 12.10
Guide I followed: https://github.com/mseknibilel/OpenStack-Folsom-Install-guide/blob/GRE/2NICs/OpenStack_Folsom_Install_Guide_WebVersion.rst
Three nodes deployment: A controller node, a quantum node and a compute note.
Quantum runs with openvswitch plugin and GRE tunnel enabled.

Problem Description:
When a VM launched, it cannot get a IP address. If I configure a IP address by myself, it still cannot ping to virtual net gateway. Also, the metadata server(169.254.169.254) is unreachable.

In compute node, the configuration files are uploaded here:
nova-compute.conf:
http://pastebin.com/9Dk2jSMx
nova.conf:
http://pastebin.com/HR6KaDwJ

The message when VM booting:
Initializing random number generator... done.
Starting network...
udhcpc (v1.18.5) started
Sending discover...
Sending discover...
Sending discover...
No lease, failing
WARN: /etc/rc3.d/S40-network failed
cloud-setup: checking http://169.254.169.254/2009-04-04/meta-data/instance-id
wget: can't connect to remote host (169.254.169.254): Network is unreachable
cloud-setup: failed 1/30: up 15.13. request failed
wget: can't connect to remote host (169.254.169.254): Network is unreachable
cloud-setup: failed 2/30: up 16.20. request failed

In network node, I can see dnsmasq is offering DHCP info to the VM:
Dec 16 15:17:01 os-network CRON[21955]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 16 15:17:24 os-network dnsmasq[1695]: cleared cache
Dec 16 15:17:24 os-network dnsmasq-dhcp[1695]: read /var/lib/quantum/dhcp/99302650-9edf-44e8-adc7-21a8782f1dfd/host
Dec 16 15:17:24 os-network dnsmasq-dhcp[1695]: read /var/lib/quantum/dhcp/99302650-9edf-44e8-adc7-21a8782f1dfd/opts
Dec 16 15:17:50 os-network dnsmasq[1695]: cleared cache
Dec 16 15:17:50 os-network dnsmasq-dhcp[1695]: read /var/lib/quantum/dhcp/99302650-9edf-44e8-adc7-21a8782f1dfd/host
Dec 16 15:17:50 os-network dnsmasq-dhcp[1695]: read /var/lib/quantum/dhcp/99302650-9edf-44e8-adc7-21a8782f1dfd/opts
Dec 16 15:17:50 os-network dnsmasq[1695]: cleared cache
Dec 16 15:17:50 os-network dnsmasq-dhcp[1695]: read /var/lib/quantum/dhcp/99302650-9edf-44e8-adc7-21a8782f1dfd/host
Dec 16 15:17:50 os-network dnsmasq-dhcp[1695]: read /var/lib/quantum/dhcp/99302650-9edf-44e8-adc7-21a8782f1dfd/opts
Dec 16 15:18:07 os-network dnsmasq-dhcp[1695]: DHCPDISCOVER(tapcd468438-56) fa:16:3e:9d:e2:17
Dec 16 15:18:07 os-network dnsmasq-dhcp[1695]: DHCPOFFER(tapcd468438-56) 50.50.1.3 fa:16:3e:9d:e2:17
Dec 16 15:18:10 os-network dnsmasq-dhcp[1695]: DHCPDISCOVER(tapcd468438-56) fa:16:3e:9d:e2:17
Dec 16 15:18:10 os-network dnsmasq-dhcp[1695]: DHCPOFFER(tapcd468438-56) 50.50.1.3 fa:16:3e:9d:e2:17
Dec 16 15:18:13 os-network dnsmasq-dhcp[1695]: DHCPDISCOVER(tapcd468438-56) fa:16:3e:9d:e2:17
Dec 16 15:18:13 os-network dnsmasq-dhcp[1695]: DHCPOFFER(tapcd468438-56) 50.50.1.3 fa:16:3e:9d:e2:17
Dec 16 15:19:42 os-network dnsmasq[1695]: cleared cache
Dec 16 15:19:42 os-network dnsmasq-dhcp[1695]: read /var/lib/quantum/dhcp/99302650-9edf-44e8-adc7-21a8782f1dfd/host
Dec 16 15:19:42 os-network dnsmasq-dhcp[1695]: read /var/lib/quantum/dhcp/99302650-9edf-44e8-adc7-21a8782f1dfd/opts

However, I have no idea why the VM cannot get that DHCPOFFER and why I configure it's network manually it still cannot access gateway or other hosts...
If more information is required, please let me know, and I will add it as soon as possible

Any hints are appreciated.

Question information

Language:
English Edit question
Status:
Solved
For:
neutron Edit question
Assignee:
No assignee Edit question
Solved by:
Lingfeng Xiong
Solved:
Last query:
Last reply:
Revision history for this message
Joshua Dotson (tns9) said :
#1

On the network node, try to 'ifconfig tapcd468438-56 up'. If the the VM gets an IP after that, you're running into the same problem I am. I'm still looking for a permanent solution to my problem. I rebuilt my three node cluster mentioned in a recent launchpad question, but came up with the same issue.

-Joshua

Revision history for this message
Lingfeng Xiong (xionglingfeng) said :
#2

Thanks Joshua,
Now my VM can get a IP address. However, it sucked in this line:

Sending discover...
Sending select for 50.50.1.4...
Lease of 50.50.1.4 obtained, lease time 120
deleting routers
route: SIOCDELRT: No such process
adding dns 50.50.1.2
cloud-setup: checking http://169.254.169.254/2009-04-04/meta-data/instance-id

Seems it cannot found the metadata server? The VM just sucked there and no error info.

Revision history for this message
Lingfeng Xiong (xionglingfeng) said :
#3

After some minutes wait, it said timed out:

Starting logging: OK
Initializing random number generator... done.
Starting network...
udhcpc (v1.18.5) started
Sending discover...
Sending select for 50.50.1.4...
Lease of 50.50.1.4 obtained, lease time 120
deleting routers
route: SIOCDELRT: No such process
adding dns 50.50.1.2
cloud-setup: checking http://169.254.169.254/2009-04-04/meta-data/instance-id
wget: can't connect to remote host (169.254.169.254): Connection timed out
cloud-setup: failed 1/30: up 5.68. request failed

Revision history for this message
Lingfeng Xiong (xionglingfeng) said :
#4

Hi Joshua,
The problem has been solved.
I changed network node setting and switch to USE namespace in both dhcp-agent and l3-agent. After this, I clean all routers and networks in quantum and do these steps:
1. create a tenant private network with private IP address
2. create a corresponding subnet for the private net
3. create a public network
4. create a corresponding subnet for that public network. Remember to set the start and end ip pool and disable dhcp on it
5. create a router
6. add a interface with private network to that router
7. set gateway for that router to the public net
8. run
quntum port-list
on the CONTROLLER node and look for the router's public IP address. You should able to ping it.
9. add a route on CONTROLLER node for the private net you added.
route add -net PRIVATE_NET/MASK gw ROUTER_PUBLIC_IP_ADDR
10. enjoy :-)

If you need to troubleshooting something, you have to run:
ip netns list
this command will return a list of existing namespace. The one you are looking for should start with 'qroute'
Then run commands in that namespace like this:
ip netns exec NAMESPACE ifconfig
ip netns exec NAMESPACE ping x.x.x.1
ip netns exec NAMESPACE ping x.x.x.2

Hope this helpful.

BTW: namespace = false is hell :-(

Revision history for this message
Joshua Dotson (tns9) said :
#5

Hi,

Thanks very much for the help! I've completed all of your steps, but I'm curious about which node to add the route. My L3 and DHCP agents run on the network node, which has a public interface. Is the Control Node route being added to facilitate the control node's metadata's path back to the L3 external router on the network node?

Is the following link correct in stating that Security Groups and Nova Metadata, now that we've enable Namespaces and thereby, overlapping IP's?

http://docs.openstack.org/trunk/openstack-network/admin/content/ch_limitations.html

Joshua

Revision history for this message
Joshua Dotson (tns9) said :
#6

I created the route on the control node and it worked. Wow. This is crazy..lol...

Thanks again,
Joshua

Revision history for this message
Lingfeng Xiong (xionglingfeng) said :
#7

Hi Joshua,
I'm very sorry that I'm deploying openstack to production environment these days... too busy.

For your question, why we create a route on controller node to private network:

http://docs.openstack.org/trunk/openstack-network/admin/content/adv_cfg_l3_agent_metadata.html
OpenStack does not manage this routing for you, so you need to make sure that your host running the metadata service always has a route to reach each private network's subnet via the external network IP of that subnet's quantum router. To do this, you can either run quantum without namespaces, and run the quantum-l3-agent on the same host as nova-api. Otherwise, you can identify an IP prefix that includes all private network subnet's (e.g., 10.0.0.0/8) and then make sure that your metadata server has a route for that prefix with the quantum router's external IP address as the next hop.

Revision history for this message
marvel (marvelliu) said :
#8

Is the router must be created on the control node?

I create the router on the network node, but the vms can not access the external network....

Revision history for this message
Lingfeng Xiong (xionglingfeng) said :
#9

Hi marvel,
It will be no difference creating a router on controller node or other node or even a PC running Windows. The "qunatum" command you invoked is just a 'client' which connect to quantum server and send your commands there. So wherever you invoke quantum or nova or keystone or other openstack commands, they always run on corresponding nodes :-)
If your VM cannot access external network, try to see if it can obtain a correct private IP address. If so, run these commands:
1. run "ip netns list" to show your namespaces
2. you should see a namespace start with qrouter. copy it
3. run "ipsh netns exec THE_NAMESPACE_YOU_JUST_COPIED exec ifconfig"
"ipsh netns exec THE_NAMESPACE_YOU_JUST_COPIED exec router"
"ipsh netns exec THE_NAMESPACE_YOU_JUST_COPIED exec ping YOUR_PRIVATE_NET_GATEWAY" // usually x.x.x.1, like 50.50.50.1
"ipsh netns exec THE_NAMESPACE_YOU_JUST_COPIED exec ping YOUR_VM_IP_ADDRESS_IN_PRIVATE_NET_GATEWAY"
and try to diagnostic your problems.

Revision history for this message
marvel (marvelliu) said :
#10

Thanks Lingfeng

I can ping vm and the internal gateway, but I can not ping the external network gateway.

Mine seems to be a l3agent problem, I list it in the following thread
https://bugs.launchpad.net/quantum/+bug/1092763

Can you tak a look at it, thanks.