Cannot create a 8+ node Vertica cluster

Asked by julian-if on 2017-10-08

I am creating a Vertica cluster on OpenStack (Kilo) through Trove. It is possible to create a 2-node, 4-node or 6-node Vertica cluster but it is failed to create 8-node Vertica cluster (formed by 8 Nova instances).

I have 4 physical server (1 controller node, 3 compute nodes), each of them has 8 physical CPU and 128 RAM. For creating Vertica cluster, the flavor includes 2 VCPUs, 4 GB RAM, 15 GB root disk, 2 GB swap disk.

As OpenStack allows CPU over-committing 16:1 and RAM over-committing 1.5:1, therefore 512 vcpu and 768 GB memory could be used in the environment. (At most 8 vcpu is assigned to each instance).

After creating the Vertica cluster, I found that the status of Trove instances of the cluster is ERROR. Then, I use /opt/vertica/bin/admintools to view the status of the database cluster state which is UP. I login to the database, and try the database with '\d' or '\dn' but it stuck with no any responds.

I have changed the install_vertica command to "/opt/vertica/sbin/install_vertica -s %s -d %s -X -N -T -r /vertica.deb -L /vertica.dat -Y --no-system-checks --failure-threshold NONE", but it is still failed to create 8-nodes Vertica cluster.

I scan through all the logs in 8-node Vertica cluster and I list the logs here.

The network socket experienced an error. This Spread mailbox will no longer work until the connection is disconnected and then reconnected
2017-10-08 06:22:23.027 CatchUp:0x7f340c00eb70-d000000000000e @v_db_srvr_node0004: {runRecover} 08006/4539: Received no response from v_db_srvr_node0006 in transaction bind
2017-10-08 06:23:19.437 CatchUp:0x7f340c00eb70 @v_db_srvr_node0004: 00000/3298: Event Posted: Event Code:6 Event Id:3 Event Severity: Informational [6] PostedTimestamp: 2017-10-08 06:23:19.437249 ExpirationTimestamp: 2085-10-26 09:37:26.437249 EventCodeDescription: Node State Change ProblemDescription: Changing node v_db_srvr_node0004 startup state to RECOVER_ERROR DatabaseName: db_srvr Hostname: vertica-cluster-member-8
2017-10-08 06:23:19.437 CatchUp:0x7f340c00eb70 [Recover] Changing node v_db_srvr_node0004 startup state from SHUTDOWN to RECOVER_ERROR

10/08/17 06:21:56 SP_connect: unable to connect via UNIX socket to /tmp/4803 (pid=4472): Error: No such file or directory

Oct 8 06:34:17 [5791] [vsql.connect] EOF ERROR: vsql: could not connect to server: Connection refused
Oct 8 06:34:17 [5791] [vsql.connect] EOF ERROR:

ERROR trove.guestagent.datastore.experimental.vertica.service [-] Vertica database create failed.

DB server is not installed or is in restart mode, so for now we'll skip determining the status of DB on this instance.

Question information

English Edit question
Ubuntu openstack-trove Edit question
No assignee Edit question
Last query:
Last reply:

Can you help with this problem?

Provide an answer of your own, or ask julian-if for more information if necessary.

To post a message you must log in.