Murano Deploy Stalls when Scaling up Nodes

Asked by smarta94

When launching a Kubernetes cluster, everything appears fine. When trying to add an additional node via the actions>scaleup nodes option, a new vm is created, registered in etcd on the kuberntes masternode, and then everything seems to stall (i.e. no more progress is made). The new vm has been registred on the mater and master-add-node.sh is called (but nothing after that in the murano-agent log exsits), and only the initial download of the murano-agent configuration settings happens on the new vm (i.e. no instructions passed the "these are the login and connection information" stage to setup murano.conf in /etc/..).
Furthermore, no real logs seem to show up on the controller logs for murano-engine or murano-api.

/var/log/murano/murano-engine.log after initializing a scaleup action:

 log_http_response /usr/lib/python2.7/site-packages/muranoclient/common/http.py:124
2015-09-09 14:57:57.469 8987 DEBUG murano.dsl.executor [-] 0a88641ca9294ca1bf10b5b8642a4fd0: Begin execution: io.murano.system.Resources.yaml (-7294027655241992074) called from File "/tmp/murano-packages-cache/2f7479e9-a241-4954-a6d0-32e72497f450/io.murano.apps.docker.kubernetes.KubernetesCluster/Classes/KubernetesCluster.yaml", line 123:13 in method deploy of class io.murano.apps.docker.kubernetes.KubernetesCluster
    $.minionNodes.take($.nodeCount).select($.setupEtcd()) _invoke_method_implementation /usr/lib/python2.7/site-packages/murano/dsl/executor.py:142
2015-09-09 14:57:57.470 8987 DEBUG murano.dsl.executor [-] 0a88641ca9294ca1bf10b5b8642a4fd0: End execution: io.murano.system.Resources.yaml (-7294027655241992074) _invoke_method_implementation /usr/lib/python2.7/site-packages/murano/dsl/executor.py:160
2015-09-09 14:57:57.471 8987 DEBUG murano.dsl.executor [-] 0a88641ca9294ca1bf10b5b8642a4fd0: Begin execution: io.murano.apps.docker.kubernetes.KubernetesNode.getIp (-6747789967329102446) called from File "/tmp/murano-packages-cache/2f7479e9-a241-4954-a6d0-32e72497f450/io.murano.apps.docker.kubernetes.KubernetesCluster/Classes/KubernetesCluster.yaml", line 123:13 in method deploy of class io.murano.apps.docker.kubernetes.KubernetesCluster
    $.minionNodes.take($.nodeCount).select($.setupEtcd()) _invoke_method_implementation /usr/lib/python2.7/site-packages/murano/dsl/executor.py:142
2015-09-09 14:57:57.471 8987 DEBUG murano.dsl.executor [-] 0a88641ca9294ca1bf10b5b8642a4fd0: End execution: io.murano.apps.docker.kubernetes.KubernetesNode.getIp (-6747789967329102446) _invoke_method_implementation /usr/lib/python2.7/site-packages/murano/dsl/executor.py:160
2015-09-09 14:57:57.473 8987 DEBUG murano.dsl.executor [-] 0a88641ca9294ca1bf10b5b8642a4fd0: Begin execution: io.murano.system.Agent.call (-7715528276152306858) called from File "/tmp/murano-packages-cache/2f7479e9-a241-4954-a6d0-32e72497f450/io.murano.apps.docker.kubernetes.KubernetesCluster/Classes/KubernetesCluster.yaml", line 123:13 in method deploy of class io.murano.apps.docker.kubernetes.KubernetesCluster
    $.minionNodes.take($.nodeCount).select($.setupEtcd()) _invoke_method_implementation /usr/lib/python2.7/site-packages/murano/dsl/executor.py:142
2015-09-09 14:58:08.149 8987 DEBUG murano.engine.system.agent_listener [-] Got execution result: id 'ece45aa739c4482bb4d83db2922ca5d6' body '{u'Body': u'gateway-1=http://200.10.0.4:7001,MyVM-3=http://200.10.0.6:7001,MyVM-1=http://200.10.0.2:2380,MyVM-1=http://200.10.0.2:7001,MyVM-2=http://200.10.0.5:7001,MyVM-4=http://200.10.0.7:7001', u'SourceID': u'ece45aa739c4482bb4d83db2922ca5d6', u'ErrorCode': 0, u'FormatVersion': u'2.0.0', u'Time': u'2015-09-09 19:58:07.902294', u'Action': u'Execution:Result', u'ID': u'd00f7481e45c4013874b0bdf92044c87'}' _receive /usr/lib/python2.7/site-packages/murano/engine/system/agent_listener.py:88

In this case, MyVM-4 is the newly created vm to add into the cluster. This is running in Openstack Juno. The interface just sits there with "Configureing etcd node MyVM-4" with the status Deploy in progress, this will remain for hours until I manually delete the stack and remove the environment from the mysql database manually (abort still doesn't exist in the murano environment yet!).

Question information

Language:
English Edit question
Status:
Answered
For:
Murano Edit question
Assignee:
Stan Lagun Edit question
Last query:
Last reply:
Revision history for this message
Stan Lagun (slagun) said :
#1

I guess this should be filed as a bug and not as a question. I will try to reproduce the issue though I used to run the exact same scenario before and it did worked. I don't know for sure what went wrong but I can guess it happened because of one of several similar bugs we fixed since then. Murano uses RabbitMQ messaging to communicate with murano guest agent. Each deployment creates its own message listener for agent responses. And if previous deployment wasn't shut down correctly listener from previous deployment could steal messages that belong to second deployment making it wait forever for response. All of that is hopefully fixed by now. There is how it can be verified: either upgrade to a newer version of Murano (say Kilo) or if that is impossible try to restart Murano engine daemon before you attempt to scale cluster up. If operation hangs when done without service restart but works fine after restart then we will know for sure what is the reason and see if we can back port needed fixes back to Juno

Revision history for this message
smarta94 (smarta94) said :
#2

Restarting murano-engine seems to allow for scaleup nodes to work. One other thing to mention is that when I click any of the 4 options, including the export config, is that the "Something went wrong" page appears. Clicking back shows the action I selected to be in the process of running (and only completes if I've done the engine restart). Is there a way to fix the went wrong page issue as well , or is this related to the bug that requires restarting the engine?

Revision history for this message
Stan Lagun (slagun) said :
#3

The issue with exportConfig is that it returns file and support for action results was added in Kilo if I'm not mistaken. Kubernetes app is not 100% compatible with Juno. Also in Kilo error messages were improved so you should now see what actually went wrong

Revision history for this message
smarta94 (smarta94) said :
#4

Is there a way to fix the need to restart the murano-engine process for Juno? As of currently the systems I use have no access to the outside network, isolated on an internal network and updating everything to Kilo is not really an option at the moment. I do not care so much about the export config as much as not needing to restart murano-engine everytime a cluster needs to be changed (as I am developing the Openstack System for users and they should not have to tell me everytime they need to make changes on their tenants).

Revision history for this message
Stan Lagun (slagun) said :
#5

Unfortunately we do not support OpenStack Juno anymore. It is likely that you was affected by one of the following bugs:
https://bugs.launchpad.net/murano/+bug/1449500
https://bugs.launchpad.net/murano/+bug/1425963
You can try to apply that fixes to Juno code base manually

Can you help with this problem?

Provide an answer of your own, or ask smarta94 for more information if necessary.

To post a message you must log in.