Cannot create a 8+ node Vertica cluster

Asked by julian-if on 2017-10-08

I am creating a Vertica cluster on OpenStack (Kilo) through Trove. It is possible to create a 2-node, 4-node or 6-node Vertica cluster but it is failed to create 8-node Vertica cluster (formed by 8 Nova instances).

Background:
I have 4 physical server (1 controller node, 3 compute nodes), each of them has 8 physical CPU and 128 RAM. For creating Vertica cluster, the flavor includes 2 VCPUs, 4 GB RAM, 15 GB root disk, 2 GB swap disk.

As OpenStack allows CPU over-committing 16:1 and RAM over-committing 1.5:1, therefore 512 vcpu and 768 GB memory could be used in the environment. (At most 8 vcpu is assigned to each instance).

After creating the Vertica cluster, I found that the status of Trove instances of the cluster is ERROR. Then, I use /opt/vertica/bin/admintools to view the status of the database cluster state which is UP. I login to the database, and try the database with '\d' or '\dn' but it stuck with no any responds.

I have changed the install_vertica command to "/opt/vertica/sbin/install_vertica -s %s -d %s -X -N -T -r /vertica.deb -L /vertica.dat -Y --no-system-checks --failure-threshold NONE", but it is still failed to create 8-nodes Vertica cluster.

I scan through all the logs in 8-node Vertica cluster and I list the logs here.

vertica.log
The network socket experienced an error. This Spread mailbox will no longer work until the connection is disconnected and then reconnected
2017-10-08 06:22:23.027 CatchUp:0x7f340c00eb70-d000000000000e @v_db_srvr_node0004: {runRecover} 08006/4539: Received no response from v_db_srvr_node0006 in transaction bind
2017-10-08 06:23:19.437 CatchUp:0x7f340c00eb70 @v_db_srvr_node0004: 00000/3298: Event Posted: Event Code:6 Event Id:3 Event Severity: Informational [6] PostedTimestamp: 2017-10-08 06:23:19.437249 ExpirationTimestamp: 2085-10-26 09:37:26.437249 EventCodeDescription: Node State Change ProblemDescription: Changing node v_db_srvr_node0004 startup state to RECOVER_ERROR DatabaseName: db_srvr Hostname: vertica-cluster-member-8
2017-10-08 06:23:19.437 CatchUp:0x7f340c00eb70 [Recover] Changing node v_db_srvr_node0004 startup state from SHUTDOWN to RECOVER_ERROR

dbLog
10/08/17 06:21:56 SP_connect: unable to connect via UNIX socket to /tmp/4803 (pid=4472): Error: No such file or directory

adminTools-dbadmin.log
Oct 8 06:34:17 [5791] [vsql.connect] EOF ERROR: vsql: could not connect to server: Connection refused
Oct 8 06:34:17 [5791] [vsql.connect] EOF ERROR:

trove-guestagent.log
ERROR trove.guestagent.datastore.experimental.vertica.service [-] Vertica database create failed.

DB server is not installed or is in restart mode, so for now we'll skip determining the status of DB on this instance.

Question information

Language:
English Edit question
Status:
Open
For:
Ubuntu openstack-trove Edit question
Assignee:
No assignee Edit question
Last query:
2017-10-08
Last reply:

Can you help with this problem?

Provide an answer of your own, or ask julian-if for more information if necessary.

To post a message you must log in.