etcd member existing backwards?

Asked by smarta94
Is the existing tag for initial-cluster-state backwards in this file?
Shouldn't EXISTING_ETCD_OPTS coorelate with the tag that includes and existing state?

Question information

English Edit question
Murano Edit question
No assignee Edit question
Solved by:
Last query:
Last reply:
Revision history for this message
Stan Lagun (slagun) said :

Not sure I've got your question right. In this script %%CLUSTER_CONFIG%% and %%NAME%% are get replaced with values provided by existing etcd node. This is exactly how static etcd clustering works

Revision history for this message
smarta94 (smarta94) said :

INIT_ETCD_OPTS="--name %%NAME%% --initial-cluster-state existing --initial-cluster %%CLUSTER_CONFIG%% --data-dir /var/lib/etcd --snapshot-count 1000 --listen-peer-urls http://%%IP%%:7001, --listen-client-urls http://%%IP%%:4001, --initial-advertise-peer-urls http://%%IP%%:7001 --advertise-client-urls http://%%IP%%:4001,"

EXISTING_ETCD_OPTS="--name %%NAME%% --data-dir /var/lib/etcd --snapshot-count 1000 --listen-peer-urls http://%%IP%%:7001, --listen-client-urls http://%%IP%%:4001, --advertise-client-urls http://%%IP%%:4001,"

if [ -d /var/lib/etcd/wal/ ]
  #This will allow to restart etcd service properly to pick up properties from other peers

In this script, it checks to see if the member has already been enrolled (the if correct) and if so selects that it is part of the existing ETCD, if not it should elect it as a new member (INIT_ETCD). At least according to naming terminology/common sense. In the script, --initial-cluster-state is listed as "existing" for INIT and not a part of the EXISTING_ETCD_OPTS (by naming convention this seems backwards). Also, on my deployment, only the first node seems to being given values to replace CLUSTER_CONFIG adn NAME, additional nodes are sent blank values and stop the deployment of the Cluster to a failed state as etcd cannot start on those nodes and therfore don't enroll as members in the master. What would cause this behavior?

Revision history for this message
Stan Lagun (slagun) said :

Regarding options:
Here is how how I understand it. To add member to etcd cluster we first go to one of existing etcd nodes (k8s master node in our case) and say "we want to add member with name=X and IP=Y". It gives us something (CLUSTER_CONFIG) that we need to provide to new member so that it can authenticate itself to the master. By the time mentioned script get executed etcd cluster already exists (at least it contains 1 member - master) and new member is already added but not yet "activated". It still needs to advertise itself to the master to finish transaction (until then master is in non-operation mode that's why you cannot add 2 members simultaneously).
Until new member completes registration there is no /var/lib/etcd/wal/ so $INIT_ETCD_OPTS options are used. They make etcd authenticate to the master, complete the registration process and create /var/lib/etcd/wal/ . So on subsequent daemon runs $EXISTING_ETCD_OPTS options will be used which are correspond to normal operation mode. This conforms to the names (INIT= initialization, EXISTING = already initialized)

Regarding your issues:
I don't know what went wrong. Last time I checked this app everything worked fine. Maybe you are using not the very same image application was written for. There were changes to both k8s and etcd since then. But the fact that %%SOMETHING%% gets blank is very strange since %%NAME%% is just a name of the instance and nit obtained from the master

Revision history for this message
smarta94 (smarta94) said :

OK, the first part makes sense then.
As to the second issue, I have added in lines to the yaml file to print out what it gets there, and they return either "none" or nothing when reporting to the interface:

- $$, 'Configuration of Cluster name for node {0}'.format($clusterConfig))
- $$, 'IP Address of Cluster Master for node {0}'.format($._cluster.masterNode.getIp()))

I have described the openstack/python client versions I am using here:

Revision history for this message
Stan Lagun (slagun) said :

As I said I haven't observed this before (and I deployed k8s many many times). So the problem is probably due to the image being wrong or something wrong in your environment (DevStack etc). Since you are the only one who reported this issue it will be really hard for us to reproduce. But if you provide all the log files (Murano engine's log and log file from each of the agent's involved) it may help to diagnose what went wrong. Especially if you start from a clean log file. You can attach all the files (including changes you made to the package) right to the bug report. Also note that in murano.conf there should be debug=True and verbose=True

Revision history for this message
smarta94 (smarta94) said :

You really want the entire log of a launch for murano from the controller (i.e. murano-engine)? Thats several thousand lines.
I can get the master and minions logs for you, they are much more managable.

Revision history for this message
Stan Lagun (slagun) said :

I do need engine log to see what was going on on the server side (MuranoPL, Heat etc.) maybe there will be something there. And I also need log files from spawned VMs (murano-agent.log). However thousand lines is not good. There are 2 things that can greatly simplify investigation:
1. Reset log files (and restart Murano) so there won't be a traces of old deployments
2. Configure murano-engine to write to a separate log file. Murano API sometimes produces lots and lots of garbage messages and I don't need API logs at all

Revision history for this message
smarta94 (smarta94) said :

With a clean engine log started as the deployment started its still around 3000 lines.

Revision history for this message
Stan Lagun (slagun) said :

okay, can you attach it and murano-agen logs from VMs to the bug report?

Revision history for this message
smarta94 (smarta94) said :

Its on the bug report.

Revision history for this message
smarta94 (smarta94) said :

I have tracked the issue down to iptables and murano and modified scripts to account for iptable rules not being set correclty.