sahara ostf check fail " send request to launch cluster "

Asked by abdul hannan ghafoor

hey , i am using 3 controllers (1 controller is overloaded with cinder) + 1 compute and 3 mongodb as my setup.i am using fuel 6 and my env is HA with ceilometer and sahara enabled . i am facing problems regarding sahara. i have used sahara-juno-vanilla-1.2.1-ubuntu-14.04.qcow2 for sahara images.
i am getting the following error in the health checks on fuel web

"Failed to launch cluster. Please refer to OpenStack logs for more details."

my sahara-all.log is as follows

<14>Feb 9 06:13:08 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:13:08] "GET /v1.1/97d1c17d11a24c89a07a33675457501c/plugins HTTP/1.1" 200 1046 0.109315
<14>Feb 9 06:13:09 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:13:09] "GET /v1.1/97d1c17d11a24c89a07a33675457501c/plugins/vanilla/1.2.1 HTTP/1.1" 200 285447 0.082070
<14>Feb 9 06:13:09 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:13:09] "GET /v1.1/97d1c17d11a24c89a07a33675457501c/plugins/hdp/1.3.2 HTTP/1.1" 200 106422 0.052905
<14>Feb 9 06:13:32 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:13:32 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:13:32] "GET /v1.1/97d1c17d11a24c89a07a33675457501c/images HTTP/1.1" 200 621 0.216582
<14>Feb 9 06:13:34 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:13:34 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:13:35 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:13:35 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:13:35] "POST /v1.1/97d1c17d11a24c89a07a33675457501c/images/f558f785-9df4-4ea0-a88e-642c18ec630d/tag HTTP/1.1" 202 583 1.220409
<14>Feb 9 06:13:41 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:13:41] "GET /v1.1/97d1c17d11a24c89a07a33675457501c/plugins HTTP/1.1" 200 1046 0.006036
<14>Feb 9 06:13:42 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:13:42] "GET /v1.1/97d1c17d11a24c89a07a33675457501c/plugins/vanilla/1.2.1 HTTP/1.1" 200 285447 0.084716
<14>Feb 9 06:13:42 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:13:42] "GET /v1.1/97d1c17d11a24c89a07a33675457501c/plugins/hdp/1.3.2 HTTP/1.1" 200 106422 0.056110
<14>Feb 9 06:13:53 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:13:53 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:13:53 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:13:53] "GET /v1.1/97d1c17d11a24c89a07a33675457501c/images/f558f785-9df4-4ea0-a88e-642c18ec630d HTTP/1.1" 200 577 0.464584
<14>Feb 9 06:13:55 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:13:55 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:13:56 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:13:56 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:13:56] "POST /v1.1/97d1c17d11a24c89a07a33675457501c/images/f558f785-9df4-4ea0-a88e-642c18ec630d/tag HTTP/1.1" 202 623 1.135081
<14>Feb 9 06:14:04 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:14:04 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:14:05 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:14:06 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:14:07 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:14:07 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:14:08 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:14:08 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:14:08] "DELETE /v1.1/97d1c17d11a24c89a07a33675457501c/images/afac6488-0ece-495e-bc35-32600d537518 HTTP/1.1" 204 115 4.252639
<14>Feb 9 06:14:29 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:14:30 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:14:30] "GET /v1.1/97d1c17d11a24c89a07a33675457501c/images HTTP/1.1" 200 620 0.542594
<14>Feb 9 06:15:24 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:15:24] "DELETE /v1.0/97d1c17d11a24c89a07a33675457501c/cluster-templates/84457bd1-c010-414d-9ee7-ca76f6b3ed9c HTTP/1.1" 204 115 0.325205
<14>Feb 9 06:15:48 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:15:48] "GET /v1.1/97d1c17d11a24c89a07a33675457501c/node-group-templates HTTP/1.1" 200 6033 0.328521
<14>Feb 9 06:17:31 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:17:31] "GET /v1.1/97d1c17d11a24c89a07a33675457501c/cluster-templates HTTP/1.1" 200 6018 0.055037
<14>Feb 9 06:17:36 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:17:36] "GET /v1.1/97d1c17d11a24c89a07a33675457501c/plugins/vanilla/2.4.1 HTTP/1.1" 200 340598 0.083768
<14>Feb 9 06:17:52 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:17:52] "GET /v1.1/97d1c17d11a24c89a07a33675457501c/cluster-templates HTTP/1.1" 200 6018 0.053145
<14>Feb 9 06:18:20 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:18:20] "GET /v1.1/97d1c17d11a24c89a07a33675457501c/plugins HTTP/1.1" 200 1046 0.006842
<14>Feb 9 06:18:27 node-28 sahara-all 192.168.0.2 - - [09/Feb/2015 06:18:27] "GET /v1.1/97d1c17d11a24c89a07a33675457501c/plugins/vanilla/1.2.1 HTTP/1.1" 200 285447 0.076093

any idea what would be the problem here ?

Question information

Language:
English Edit question
Status:
Solved
For:
Fuel for OpenStack Edit question
Assignee:
No assignee Edit question
Solved by:
Fabrizio Soppelsa
Solved:
Last query:
Last reply:
Revision history for this message
abdul hannan ghafoor (ahannan-ghafoor) said :
#1

btw i also tried to manually launch cluster using the cluster template. for that i have registered centos 2.4.1 sahara image data processing tab and used the vanilla 2 template which also uses the 2.4.1 hadoop version.

i am getting the "Node test-master-001 has error status" error. test is the name of my cluster.

logs from sahara-all.log are as follows.

<14>Feb 9 06:57:22 node-28 sahara-all Cluster status has been changed: id=bff4e166-0054-4e59-8f9d-846f05d9d0bc, New status=Waiting
<14>Feb 9 06:57:23 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<11>Feb 9 06:57:23 node-28 sahara-all Error during operating cluster 'test' (reason: Node test-master-001 has error status)
2015-02-09 06:57:23.362 7918 TRACE sahara.service.ops Traceback (most recent call last):
2015-02-09 06:57:23.362 7918 TRACE sahara.service.ops File "/usr/lib/python2.6/site-packages/sahara/service/ops.py", line 113, in wrapper
2015-02-09 06:57:23.362 7918 TRACE sahara.service.ops f(cluster_id, *args, **kwds)
2015-02-09 06:57:23.362 7918 TRACE sahara.service.ops File "/usr/lib/python2.6/site-packages/sahara/service/ops.py", line 198, in _provision_cluster
2015-02-09 06:57:23.362 7918 TRACE sahara.service.ops INFRA.create_cluster(cluster)
2015-02-09 06:57:23.362 7918 TRACE sahara.service.ops File "/usr/lib/python2.6/site-packages/sahara/service/direct_engine.py", line 58, in create_cluster
2015-02-09 06:57:23.362 7918 TRACE sahara.service.ops self._await_active(cluster, instances)
2015-02-09 06:57:23.362 7918 TRACE sahara.service.ops File "/usr/lib/python2.6/site-packages/sahara/service/direct_engine.py", line 398, in _await_active
2015-02-09 06:57:23.362 7918 TRACE sahara.service.ops if self._check_if_active(instance):
2015-02-09 06:57:23.362 7918 TRACE sahara.service.ops File "/usr/lib/python2.6/site-packages/sahara/service/direct_engine.py", line 426, in _check_if_active
2015-02-09 06:57:23.362 7918 TRACE sahara.service.ops raise exc.SystemError(_("Node %s has error status") % server.name)
2015-02-09 06:57:23.362 7918 TRACE sahara.service.ops SystemError: Node test-master-001 has error status
2015-02-09 06:57:23.362 7918 TRACE sahara.service.ops
<14>Feb 9 06:57:23 node-28 sahara-all Cluster 'test' creation rollback (reason: Node test-master-001 has error status)
<14>Feb 9 06:57:23 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:57:23 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:57:24 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:57:25 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:57:26 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:57:26 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:57:27 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:57:27 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:57:27 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:57:28 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:57:28 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:57:29 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:57:29 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:57:30 node-28 sahara-all Starting new HTTP connection (1): 172.17.10.101
<14>Feb 9 06:57:35 node-28 sahara-all Cluster status has been changed: id=bff4e166-0054-4e59-8f9d-846f05d9d0bc, New status=Error

what does this error mean ???

Revision history for this message
Fabrizio Soppelsa (fsoppelsa) said :
#2

Greetings Abdul Hannan

Your bug is addressed by https://bugs.launchpad.net/fuel/+bug/1371083 but marked invalid. I'm linking to it and reporting the developers that you reproduced. Please refer to the bug page.

Best regards,
Fabrizio
Mirantis Fuel Team

Revision history for this message
abdul hannan ghafoor (ahannan-ghafoor) said :
#3

thanks for the link. people there have stated that the error usually come when compute node has less than 4gb ram. compute node in my case has 64gb ram so i guess that does not apply here.

Revision history for this message
Best Fabrizio Soppelsa (fsoppelsa) said :
#4

Hello Abdul Hannan, can you please close this question and continue following it on the bug page?
Developers are asking you to provide further logs.

Thank you,
Fabrizio
Mirantis Fuel Team

Revision history for this message
abdul hannan ghafoor (ahannan-ghafoor) said :
#5

Thanks Fabrizio Soppelsa, that solved my question.