Cluster in waiting state

Asked by Savanna

The cluster remains in "waiting" state even after its master and worker node instances turn to "active" state. I used fuel installer and grizzly on centos, I used auto assign floating ips in network settings(nova, flat DHCP) and nova in savanna seetings for hortonworks plug in. (I even checked thatnetwork settings are correct in fuel) And when I go to the console of any node instance it says "Booting from hard disk", also the console log is empty. Can some1 please tell what can be done ?

(aim : To deploy hdp 1.3 plugin on savanna 0.2/0.2.2)

Question information

Language:
English Edit question
Status:
Answered
For:
Sahara Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:

This question was originally filed as bug #1265019.

This question was reopened

Revision history for this message
Matthew Farrellee (mattf) said :
#1

Converting to a question

Revision history for this message
Matthew Farrellee (mattf) said :
#2

There are many things that could be wrong here, and most of the time they are independent of savanna, i.e. misconfiguration of openstack.

The first thing savanna does to start a cluster is ssh into the instances. To rule out savanna as an issue, you should start an instance manually (not w/ savanna) and make sure you can ssh into it.

If you can't the issue is definitely in your openstack configuration.

If you can, then you should look into the /var/log/savanna/api.log file for any warnings or errors.

If you see nothing, please add debug=true to your savanna.conf and provide us your configuration and some logs.

Revision history for this message
Savanna (xyzxyzxyz) said :
#5

Thank you so much for the reply. I have few more doubts after immense drilling :

1. I checked install savanna in fuel and have "Savanna" tab in my UI Horizon. Is this installation of savanna enough or should I download any other for proper working of savanna ?

2. Is there anything like "Savanna dashboard" which acts like the "Savanna-only" UI that I can install?

Revision history for this message
Matthew Farrellee (mattf) said :
#6

Please open new questions instead of continuing an existing/answered question. Doing so will help the entire community find and address issues.

Revision history for this message
Sunzen Wang (sunzen) said :
#7

Matthew Farrellee,

Thank you for your information.

In my enviroment (icehouse at CentOS 6.5, with FLATDHCP network built on ML2/OpenvSwitch) ssh into VM instances is ok,
but cluster is still always waiting after launched.

After adding debug=true to sahara.conf, special log record is as following:

2014-06-12 19:54:25.915 23444 DEBUG sahara.service.engine [-] Can't login to node minione-minione121-001 (192.168.200.145), reason RuntimeError: Neutron router not found corresponding to network fd76b58c-3733-461a-b7ac-3b823c460e48 _wait_until_accessible /usr/lib/python2.6/site-packages/sahara/service/engine.py:95

I used FlatDHCP, so I didn't configure neutron router.
Why neutron router is necessary? How to handle it for FlatDHCP network?

I tried to configure a default router, specially log records are as following:
2014-06-12 20:02:04.858 23444 DEBUG sahara.service.engine [-] Can't login to node minione-minione121-001 (192.168.200.145), reason EOFError: _wait_until_accessible /usr/lib/python2.6/site-packages/sahara/service/engine.py:95
2014-06-12 20:02:10.689 23444 DEBUG sahara.service.engine [-] Can't login to node minione-minione121-001 (192.168.200.145), reason EOFError: _wait_until_accessible /usr/lib/python2.6/site-packages/sahara/service/engine.py:95
2014-06-12 20:02:16.516 23444 DEBUG sahara.service.engine [-] Can't login to node minione-minione121-001 (192.168.200.145), reason SSHException: Error reading SSH protocol banner _wait_until_accessible /usr/lib/python2.6/site-packages/sahara/service/engine.py:95

I'm still not good at OpenStack networking.
Thank you for your more guidance.

Regards

Revision history for this message
Sunzen Wang (sunzen) said :
#8

Matthew Farrellee,

Thank you for your information.

In my enviroment (icehouse at CentOS 6.5, with FLATDHCP network built on ML2/OpenvSwitch) ssh into VM instances is ok,
but cluster is still always waiting after launched.

After adding debug=true to sahara.conf, special log record is as following:

2014-06-12 19:54:25.915 23444 DEBUG sahara.service.engine [-] Can't login to node minione-minione121-001 (192.168.200.145), reason RuntimeError: Neutron router not found corresponding to network fd76b58c-3733-461a-b7ac-3b823c460e48 _wait_until_accessible /usr/lib/python2.6/site-packages/sahara/service/engine.py:95

I used FlatDHCP, so I didn't configure neutron router.
Why neutron router is necessary? How to handle it for FlatDHCP network?

I tried to configure a default router, specially log records are as following:
2014-06-12 20:02:04.858 23444 DEBUG sahara.service.engine [-] Can't login to node minione-minione121-001 (192.168.200.145), reason EOFError: _wait_until_accessible /usr/lib/python2.6/site-packages/sahara/service/engine.py:95
2014-06-12 20:02:10.689 23444 DEBUG sahara.service.engine [-] Can't login to node minione-minione121-001 (192.168.200.145), reason EOFError: _wait_until_accessible /usr/lib/python2.6/site-packages/sahara/service/engine.py:95
2014-06-12 20:02:16.516 23444 DEBUG sahara.service.engine [-] Can't login to node minione-minione121-001 (192.168.200.145), reason SSHException: Error reading SSH protocol banner _wait_until_accessible /usr/lib/python2.6/site-packages/sahara/service/engine.py:95

I'm still not good at OpenStack networking.
Thank you for your more guidance.

Regards

Revision history for this message
Dmitry Mescheryakov (dmitrymex) said :
#9

Sunzen,

Can you show your config? Also try setting
use_namespaces=false

and also if you don't have floating IPs in your setup, set
use_floating_ips=false

Also, please in the future use ask.openstack.org. We abandoned answers.launchpad.net half a year ago.

Revision history for this message
Sunzen Wang (sunzen) said :
#10

Dmitry,

Thank you for your response.

I tried settting use_namespaces=false, but it still doesn't work.

I have floating IPs, and once tried setting use_floating_ips=true, but
it shows 'node group Xxxx is missing 'floating_ip_pool'' when launching.
I didn't find out the way to set floating_ip_pool for a node group, so later
actually I set use_floating_ips=false.

I will use ask.openstack.org later and post more information about my config.
Thank you for your attention.

Regards

Revision history for this message
Dmitry Mescheryakov (dmitrymex) said :
#11

As per
https://sahara.readthedocs.org/en/latest/horizon/installation.guide.html

Find Horizon settings file local_settings.py and set the following two values here:
SAHARA_USE_NEUTRON = True
AUTO_ASSIGNMENT_ENABLED = False

And restart Horizon after that.

After that 'Floating IP pool' dropdown will appear in Node Group creation window.

Revision history for this message
Sunzen Wang (sunzen) said :
#12

Dmitry,

It does work!
Thanks a lot for your teaching.

While, the installation guide seems a little misleading.
"If you are using Nova-Network with auto_assign_floating_ip=False add the following parameter:"
I'm using Neutron, why should i care configuration entry which is related with Nova-network? so
previously i mainly skipped it.

Another question, what's the correct way to provision hadoop cluster on provider FlatDHCP network?
This time, use_floating_ips should be false, how about others?
I tried either AUTO_ASSIGNMENT_ENABLED=False or AUTO_ASSIGNMENT_ENABLED=True,
Hadoop cluster is always hanging on waiting state

I'm sorry for that we continue at answers.lauchpad.net.

Regards
Sunzen

Revision history for this message
Dmitry Mescheryakov (dmitrymex) said :
#13

Sunzen,

>> I'm using Neutron, why should i care configuration entry which is related with Nova-network? so
previously i mainly skipped it.

Agree. Right now we are in a process of merging sahara-dashboard into Horizon and this parameter is set to false there by default.

>> Another question, what's the correct way to provision hadoop cluster on provider FlatDHCP network? This time, use_floating_ips should be false, how about others?

Hmm, I don't familiar with that mode in Neutron. Can one use floating IPs in FlatDHCP mode? If yes, than you can still use use_floating_ips=true.

If the answer is not, then it becomes complicated. You see, Sahara needs to connect to VMs using SSH to configure them. Floating IPs are the easiest way to reach the VMs. With Neutron controller node generally does not have connection to private networks. There is a hack in Sahara which enables it to connect to Neutron private network, but it might not work. Other people implemented it, and so I do not understand it fully. Set
use_floating_ips=true
use_namespaces=true

The requirement here is that Sahara is run on the OpenStack controller node _and_ all Neutron 'controller' processes are run on the same controller node. This is easily achievable if you have only one OpenStack controller. If you use HA, than it might work or it might not work.

Can you help with this problem?

Provide an answer of your own, or ask Savanna for more information if necessary.

To post a message you must log in.