Error with more than one cinder-volume

Asked by renminmin on 2013-05-23

I download the grizzly version openstack and deployed it on two nodes(node1 and dev202)
node1: cinder-api cinder-scheduler cinder-volume, keystone, mysql, qpid
node2: only cinder-volume

rpc_backend = cinder.openstack.common.rpc.impl_qpid
qpid_durable_queues = True

qpid-tool watch the queue status (list queue active):
    252353 07:50:56 - 248315.cinder-scheduler
    252354 07:50:56 - 248315.cinder-scheduler.node1
    252355 07:50:56 - 248315.cinder-scheduler_fanout_e0ef7071e1b744769df5f06dae595550
    252479 07:52:27 - 248315.cinder-volume
    252480 07:52:27 - 248315.cinder-volume.node1
    252481 07:52:27 - 248315.cinder-volume_fanout_df149604027d49fabd8853f3acb7e997
    252549 07:52:49 - 248315.cinder-volume.dev202
    252550 07:52:49 - 248315.cinder-volume_fanout_4bab111c0de74d8285b12ba4990d9ec9

then I stop cinder-volume service on node1(with command kill)
list queue active:
    252353 07:50:56 - 248315.cinder-scheduler
    252354 07:50:56 - 248315.cinder-scheduler.node1
    252355 07:50:56 - 248315.cinder-scheduler_fanout_e0ef7071e1b744769df5f06dae595550

all of cinder-volume queues were deleted.
After this operation, cinder create volume status became creating, even cinder-volume on dev202 is still active but no queue message receive.
However, cinder-volume on dev202 still report its status to cinder-scheduler.

How to solve this problem?
I think that the primary reason is the queue delete message.

Question information

Language:
English Edit question
Status:
Solved
For:
Cinder Edit question
Assignee:
No assignee Edit question
Solved by:
renminmin
Solved:
2013-06-04
Last query:
2013-06-04
Last reply:
2013-05-31

This question was reopened

Shanthakumar K (shantha-kumar) said : #1

Ensure you receive the update from cinder-volume service of each node to cinder-scheduler log and check the volume and scheduler log for more information.

share the logs & conf file to help you more.

renminmin (rmm0811) said : #2

My cinder config file and part of logs
=============================================
node1 config file: cinder-api cinder-scheduler cinder-volume are running with same config file
cinder.conf
[DEFAULT]
logdir = /var/log/cinder
state_path = /var/lib/cinder
lock_path = /var/lib/cinder/tmp
gpfs_volumes_dir = /gpfs/volumes/volumes/node1
iscsi_helper = tgtadm
sql_connection = mysql://cinder:cinder@192.168.0.201/cinder
rpc_backend = cinder.openstack.common.rpc.impl_qpid
rootwrap_config = /etc/cinder/rootwrap.conf
auth_uri = http://192.168.0.201:5000
auth_strategy = keystone
qpid_hostname = localhost
api_paste_config = /etc/cinder/api-paste.ini
volume_name_template = volume-%s
volume_group = cinder-volumes
verbose = True
debug = True
volume_driver = cinder.volume.drivers.gpfs.GpfsDriver
scheduler_default_filters = RetryFilter,AvailabilityZoneFilter,CapabilitiesFilter
storage_availability_zone = test:dev188

[keystone_authtoken]
admin_tenant_name = service
admin_user = cinder
admin_password = cinder
auth_host = 192.168.0.201
auth_port = 35357
auth_protocol = http
signing_dirname = /tmp/keystone-signing-cinder

================================================================
dev202 : cinder-volume run on it
cinder.conf

[DEFAULT]
logdir = /var/log/cinder
state_path = /var/lib/cinder
lock_path = /var/lib/cinder/tmp
gpfs_volumes_dir = /gpfs/volumes/volumes/dev202
iscsi_helper = tgtadm
sql_connection = mysql://cinder:cinder@192.168.0.201/cinder
rpc_backend = cinder.openstack.common.rpc.impl_qpid
rootwrap_config = /etc/cinder/rootwrap.conf
auth_uri = http://192.168.0.201:5000
auth_strategy = keystone
qpid_hostname = 192.168.0.201
api_paste_config = /etc/cinder/api-paste.ini
volume_name_template = volume-%s
volume_group = cinder-volumes
verbose = True
debug = True
volume_driver = cinder.volume.drivers.gpfs.GpfsDriver
scheduler_default_filters = RetryFilter,AvailabilityZoneFilter,CapabilitiesFilter
storage_availability_zone = test:dev202

[keystone_authtoken]
admin_tenant_name = service
admin_user = cinder
admin_password = cinder
auth_host = 192.168.0.201
auth_port = 35357
auth_protocol = http

====================================================================

node1 : kill -9 cinder-volume
/var/log/cinder/volume.log

2013-05-23 21:39:57 INFO [cinder.service] Caught SIGTERM, stopping children
2013-05-23 21:39:57 INFO [cinder.service] Waiting on 1 children to exit
2013-05-23 21:39:57 INFO [cinder.service] Caught SIGTERM, exiting
2013-05-23 21:39:57 DEBUG [qpid.messaging.io.ops] SENT[2638c68]: MessageCancel(destination='0', id=serial(25))
2013-05-23 21:39:57 DEBUG [qpid.messaging.io.ops] SENT[2638c68]: QueueDelete(queue='cinder-volume', id=serial(26), sync=True)
2013-05-23 21:39:57 DEBUG [qpid.messaging.io.ops] SENT[2638c68]: ConnectionClose(reply_code=200)
2013-05-23 21:39:57 DEBUG [qpid.messaging.io.raw] SENT[2638c68]: '\x0f\x01\x00\x14\x00\x01\x00\x00\x00\x00\x00\x00\x04\x08\x01\x00\x01\x00\x010\x0f\x01\x00 \x00\x01\x00\x00\x00\x00\x00\x00\x08\x02\x01\x01\x01\x00\rcinder-volume\x0f\x00\x00\x12\x00\x00\x00\x00\x00\x00\x00\x00\x01\x0b\x01\x00\x00\xc8'
2013-05-23 21:39:57 DEBUG [qpid.messaging.io.raw] READ[2638c68]: '\x0f\x00\x00\x1a\x00\x00\x00\x00\x00\x00\x00\x00\x02\n\x01\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x1a\x0f\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x01\x0c\x00\x00'
2013-05-23 21:39:57 DEBUG [qpid.messaging.io.ops] RCVD[2638c68]: SessionCompleted(commands=[0-26])
2013-05-23 21:39:57 DEBUG [qpid.messaging.io.ops] RCVD[2638c68]: ConnectionCloseOk()
2013-05-23 21:39:57 INFO [cinder.service] Child 332 exited with status 1

=============================================================
scheduler.log after node1 kill cinder-volume

2013-05-23 22:29:36 DEBUG [qpid.messaging] RCVD[3dff8c0]: Message({u'_context_roles': [u'admin'], u'_context_request_id': u'req-23a28132-0658-4392-a0fe-128beb54f0da', u'_context_quota_class': None, u'_context_auth_token': None, u'args': {u'service_name': u'volume', u'host': u'dev202', u'capabilities': {u'QoS_support': False, u'volume_backend_name': u'GPFS', u'free_capacity_gb': 45.0, u'driver_version': u'1.0', u'total_capacity_gb': 50.0, u'reserved_percentage': 0, u'vendor_name': u'IBM', u'storage_protocol': u'file'}}, u'_context_tenant': None, u'_unique_id': u'a6ac60baf6394f649950b5c23cf657de', u'_context_timestamp': u'2013-05-24T03:29:33.537222', u'_context_is_admin': True, u'version': u'1.0', u'_context_project_id': None, u'_context_user': None, u'_context_read_deleted': u'no', u'_context_user_id': None, u'method': u'update_service_capabilities', u'_context_remote_address': None})
2013-05-23 22:29:36 DEBUG [qpid.messaging.io.ops] SENT[3df9758]: ConnectionHeartbeat()
2013-05-23 22:29:36 DEBUG [qpid.messaging] RETR[3dff8c0]: Message({u'_context_roles': [u'admin'], u'_context_request_id': u'req-23a28132-0658-4392-a0fe-128beb54f0da', u'_context_quota_class': None, u'_context_auth_token': None, u'args': {u'service_name': u'volume', u'host': u'dev202', u'capabilities': {u'QoS_support': False, u'volume_backend_name': u'GPFS', u'free_capacity_gb': 45.0, u'driver_version': u'1.0', u'total_capacity_gb': 50.0, u'reserved_percentage': 0, u'vendor_name': u'IBM', u'storage_protocol': u'file'}}, u'_context_tenant': None, u'_unique_id': u'a6ac60baf6394f649950b5c23cf657de', u'_context_timestamp': u'2013-05-24T03:29:33.537222', u'_context_is_admin': True, u'version': u'1.0', u'_context_project_id': None, u'_context_user': None, u'_context_read_deleted': u'no', u'_context_user_id': None, u'method': u'update_service_capabilities', u'_context_remote_address': None})
2013-05-23 22:29:36 DEBUG [qpid.messaging.io.ops] SENT[3df9758]: MessageFlow(destination='2', unit=0, value=1L, id=serial(120))
2013-05-23 22:29:36 DEBUG [cinder.openstack.common.rpc.amqp] received {u'_context_roles': [u'admin'], u'_context_request_id': u'req-23a28132-0658-4392-a0fe-128beb54f0da', u'_context_quota_class': None, u'_unique_id': u'a6ac60baf6394f649950b5c23cf657de', u'_context_read_deleted': u'no', u'args': {u'service_name': u'volume', u'host': u'dev202', u'capabilities': {u'QoS_support': False, u'volume_backend_name': u'GPFS', u'free_capacity_gb': 45.0, u'driver_version': u'1.0', u'total_capacity_gb': 50.0, u'reserved_percentage': 0, u'vendor_name': u'IBM', u'storage_protocol': u'file'}}, u'_context_tenant': None, u'_context_auth_token': '<SANITIZED>', u'_context_is_admin': True, u'version': u'1.0', u'_context_project_id': None, u'_context_timestamp': u'2013-05-24T03:29:33.537222', u'_context_user': None, u'_context_user_id': None, u'method': u'update_service_capabilities', u'_context_remote_address': None}
2013-05-23 22:29:36 DEBUG [cinder.openstack.common.rpc.amqp] unpacked context: {'user_id': None, 'roles': [u'admin'], 'timestamp': u'2013-05-24T03:29:33.537222', 'auth_token': '<SANITIZED>', 'remote_address': None, 'quota_class': None, 'is_admin': True, 'user': None, 'request_id': u'req-23a28132-0658-4392-a0fe-128beb54f0da', 'project_id': None, 'read_deleted': u'no', 'tenant': None}
2013-05-23 22:29:36 DEBUG [qpid.messaging.io.ops] SENT[3df9758]: SessionCompleted(commands=[0-102])
2013-05-23 22:29:36 DEBUG [qpid.messaging.io.raw] SENT[3df9758]: '\x0f\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x01\n\x00\x00\x0f\x01\x00\x19\x00\x01\x00\x00\x00\x00\x00\x00\x04\n\x01\x00\x07\x00\x012\x00\x00\x00\x00\x01\x0f\x00\x00\x1a\x00\x00\x00\x00\x00\x00\x00\x00\x02\n\x01\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00f'
2013-05-23 22:29:36 DEBUG [cinder.scheduler.host_manager] Received volume service update from dev202.

renminmin (rmm0811) said : #3

My cinder config file and part of logs
=============================================
node1 config file: cinder-api cinder-scheduler cinder-volume are running with same config file
cinder.conf
[DEFAULT]
logdir = /var/log/cinder
state_path = /var/lib/cinder
lock_path = /var/lib/cinder/tmp
gpfs_volumes_dir = /gpfs/volumes/volumes/node1
iscsi_helper = tgtadm
sql_connection = mysql://cinder:cinder@192.168.0.201/cinder
rpc_backend = cinder.openstack.common.rpc.impl_qpid
rootwrap_config = /etc/cinder/rootwrap.conf
auth_uri = http://192.168.0.201:5000
auth_strategy = keystone
qpid_hostname = localhost
api_paste_config = /etc/cinder/api-paste.ini
volume_name_template = volume-%s
volume_group = cinder-volumes
verbose = True
debug = True
volume_driver = cinder.volume.drivers.gpfs.GpfsDriver
scheduler_default_filters = RetryFilter,AvailabilityZoneFilter,CapabilitiesFilter
storage_availability_zone = test:dev188

[keystone_authtoken]
admin_tenant_name = service
admin_user = cinder
admin_password = cinder
auth_host = 192.168.0.201
auth_port = 35357
auth_protocol = http
signing_dirname = /tmp/keystone-signing-cinder

================================================================
dev202 : cinder-volume run on it
cinder.conf

[DEFAULT]
logdir = /var/log/cinder
state_path = /var/lib/cinder
lock_path = /var/lib/cinder/tmp
gpfs_volumes_dir = /gpfs/volumes/volumes/dev202
iscsi_helper = tgtadm
sql_connection = mysql://cinder:cinder@192.168.0.201/cinder
rpc_backend = cinder.openstack.common.rpc.impl_qpid
rootwrap_config = /etc/cinder/rootwrap.conf
auth_uri = http://192.168.0.201:5000
auth_strategy = keystone
qpid_hostname = 192.168.0.201
api_paste_config = /etc/cinder/api-paste.ini
volume_name_template = volume-%s
volume_group = cinder-volumes
verbose = True
debug = True
volume_driver = cinder.volume.drivers.gpfs.GpfsDriver
scheduler_default_filters = RetryFilter,AvailabilityZoneFilter,CapabilitiesFilter
storage_availability_zone = test:dev202

[keystone_authtoken]
admin_tenant_name = service
admin_user = cinder
admin_password = cinder
auth_host = 192.168.0.201
auth_port = 35357
auth_protocol = http

====================================================================

node1 : kill -9 cinder-volume
/var/log/cinder/volume.log

2013-05-23 21:39:57 INFO [cinder.service] Caught SIGTERM, stopping children
2013-05-23 21:39:57 INFO [cinder.service] Waiting on 1 children to exit
2013-05-23 21:39:57 INFO [cinder.service] Caught SIGTERM, exiting
2013-05-23 21:39:57 DEBUG [qpid.messaging.io.ops] SENT[2638c68]: MessageCancel(destination='0', id=serial(25))
2013-05-23 21:39:57 DEBUG [qpid.messaging.io.ops] SENT[2638c68]: QueueDelete(queue='cinder-volume', id=serial(26), sync=True)
2013-05-23 21:39:57 DEBUG [qpid.messaging.io.ops] SENT[2638c68]: ConnectionClose(reply_code=200)
2013-05-23 21:39:57 DEBUG [qpid.messaging.io.raw] SENT[2638c68]: '\x0f\x01\x00\x14\x00\x01\x00\x00\x00\x00\x00\x00\x04\x08\x01\x00\x01\x00\x010\x0f\x01\x00 \x00\x01\x00\x00\x00\x00\x00\x00\x08\x02\x01\x01\x01\x00\rcinder-volume\x0f\x00\x00\x12\x00\x00\x00\x00\x00\x00\x00\x00\x01\x0b\x01\x00\x00\xc8'
2013-05-23 21:39:57 DEBUG [qpid.messaging.io.raw] READ[2638c68]: '\x0f\x00\x00\x1a\x00\x00\x00\x00\x00\x00\x00\x00\x02\n\x01\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x1a\x0f\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x01\x0c\x00\x00'
2013-05-23 21:39:57 DEBUG [qpid.messaging.io.ops] RCVD[2638c68]: SessionCompleted(commands=[0-26])
2013-05-23 21:39:57 DEBUG [qpid.messaging.io.ops] RCVD[2638c68]: ConnectionCloseOk()
2013-05-23 21:39:57 INFO [cinder.service] Child 332 exited with status 1

=============================================================
scheduler.log after node1 kill cinder-volume

2013-05-23 22:29:36 DEBUG [qpid.messaging] RCVD[3dff8c0]: Message({u'_context_roles': [u'admin'], u'_context_request_id': u'req-23a28132-0658-4392-a0fe-128beb54f0da', u'_context_quota_class': None, u'_context_auth_token': None, u'args': {u'service_name': u'volume', u'host': u'dev202', u'capabilities': {u'QoS_support': False, u'volume_backend_name': u'GPFS', u'free_capacity_gb': 45.0, u'driver_version': u'1.0', u'total_capacity_gb': 50.0, u'reserved_percentage': 0, u'vendor_name': u'IBM', u'storage_protocol': u'file'}}, u'_context_tenant': None, u'_unique_id': u'a6ac60baf6394f649950b5c23cf657de', u'_context_timestamp': u'2013-05-24T03:29:33.537222', u'_context_is_admin': True, u'version': u'1.0', u'_context_project_id': None, u'_context_user': None, u'_context_read_deleted': u'no', u'_context_user_id': None, u'method': u'update_service_capabilities', u'_context_remote_address': None})
2013-05-23 22:29:36 DEBUG [qpid.messaging.io.ops] SENT[3df9758]: ConnectionHeartbeat()
2013-05-23 22:29:36 DEBUG [qpid.messaging] RETR[3dff8c0]: Message({u'_context_roles': [u'admin'], u'_context_request_id': u'req-23a28132-0658-4392-a0fe-128beb54f0da', u'_context_quota_class': None, u'_context_auth_token': None, u'args': {u'service_name': u'volume', u'host': u'dev202', u'capabilities': {u'QoS_support': False, u'volume_backend_name': u'GPFS', u'free_capacity_gb': 45.0, u'driver_version': u'1.0', u'total_capacity_gb': 50.0, u'reserved_percentage': 0, u'vendor_name': u'IBM', u'storage_protocol': u'file'}}, u'_context_tenant': None, u'_unique_id': u'a6ac60baf6394f649950b5c23cf657de', u'_context_timestamp': u'2013-05-24T03:29:33.537222', u'_context_is_admin': True, u'version': u'1.0', u'_context_project_id': None, u'_context_user': None, u'_context_read_deleted': u'no', u'_context_user_id': None, u'method': u'update_service_capabilities', u'_context_remote_address': None})
2013-05-23 22:29:36 DEBUG [qpid.messaging.io.ops] SENT[3df9758]: MessageFlow(destination='2', unit=0, value=1L, id=serial(120))
2013-05-23 22:29:36 DEBUG [cinder.openstack.common.rpc.amqp] received {u'_context_roles': [u'admin'], u'_context_request_id': u'req-23a28132-0658-4392-a0fe-128beb54f0da', u'_context_quota_class': None, u'_unique_id': u'a6ac60baf6394f649950b5c23cf657de', u'_context_read_deleted': u'no', u'args': {u'service_name': u'volume', u'host': u'dev202', u'capabilities': {u'QoS_support': False, u'volume_backend_name': u'GPFS', u'free_capacity_gb': 45.0, u'driver_version': u'1.0', u'total_capacity_gb': 50.0, u'reserved_percentage': 0, u'vendor_name': u'IBM', u'storage_protocol': u'file'}}, u'_context_tenant': None, u'_context_auth_token': '<SANITIZED>', u'_context_is_admin': True, u'version': u'1.0', u'_context_project_id': None, u'_context_timestamp': u'2013-05-24T03:29:33.537222', u'_context_user': None, u'_context_user_id': None, u'method': u'update_service_capabilities', u'_context_remote_address': None}
2013-05-23 22:29:36 DEBUG [cinder.openstack.common.rpc.amqp] unpacked context: {'user_id': None, 'roles': [u'admin'], 'timestamp': u'2013-05-24T03:29:33.537222', 'auth_token': '<SANITIZED>', 'remote_address': None, 'quota_class': None, 'is_admin': True, 'user': None, 'request_id': u'req-23a28132-0658-4392-a0fe-128beb54f0da', 'project_id': None, 'read_deleted': u'no', 'tenant': None}
2013-05-23 22:29:36 DEBUG [qpid.messaging.io.ops] SENT[3df9758]: SessionCompleted(commands=[0-102])
2013-05-23 22:29:36 DEBUG [qpid.messaging.io.raw] SENT[3df9758]: '\x0f\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x01\n\x00\x00\x0f\x01\x00\x19\x00\x01\x00\x00\x00\x00\x00\x00\x04\n\x01\x00\x07\x00\x012\x00\x00\x00\x00\x01\x0f\x00\x00\x1a\x00\x00\x00\x00\x00\x00\x00\x00\x02\n\x01\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00f'
2013-05-23 22:29:36 DEBUG [cinder.scheduler.host_manager] Received volume service update from dev202.

renminmin (rmm0811) said : #4

When I stop volume-service on node1, "Delete failed" message in cinder.log, however stop dev202, it was not appear.

stop service log:

2013-05-27 04:10:23 INFO [cinder.service] Caught SIGTERM, stopping children
2013-05-27 04:10:23 INFO [cinder.service] Waiting on 1 children to exit
2013-05-27 04:10:23 INFO [cinder.service] Caught SIGTERM, exiting
2013-05-27 04:10:23 DEBUG [qpid.messaging.io.ops] SENT[37e8c68]: MessageCancel(destination='0', id=serial(25))
2013-05-27 04:10:23 DEBUG [qpid.messaging.io.ops] SENT[37e8c68]: QueueDelete(queue='cinder-volume', id=serial(26), sync=True)
2013-05-27 04:10:23 DEBUG [qpid.messaging.io.ops] SENT[37e8c68]: ConnectionClose(reply_code=200)
2013-05-27 04:10:23 DEBUG [qpid.messaging.io.raw] SENT[37e8c68]: '\x0f\x01\x00\x14\x00\x01\x00\x00\x00\x00\x00\x00\x04\x08\x01\x00\x01\x00\x010\x0f\x01\x00 \x00\x01\x00\x00\x00\x00\x00\x00\x08\x02\x01\x01\x01\x00\rcinder-volume\x0f\x00\x00\x12\x00\x00\x00\x00\x00\x00\x00\x00\x01\x0b\x01\x00\x00\xc8'
2013-05-27 04:10:23 DEBUG [qpid.messaging.io.raw] READ[37e8c68]: '\x0f\x01\x00y\x00\x01\x00\x00\x00\x00\x00\x00\x03\x03\x01\x00\x7f\x00\x01\x94\x00\x00\x00\x1a\x08\x02\x00\x00Tnot-found: Delete failed. No such queue: cinder-volume (qpid/broker/Broker.cpp:1119)\x00\x00\x00\x04\x00\x00\x00\x00\x0f\x00\x008\x00\x00\x00\x00\x00\x00\x00\x00\x02\x03\x01\x00\x00&419462f4-23a7-4d15-b1db-866aa4ffb380:0\x0f\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x01\x0c\x00\x00'
2013-05-27 04:10:23 DEBUG [qpid.messaging.io.ops] RCVD[37e8c68]: ExecutionException(error_code=404, command_id=serial(26), class_code=8, command_code=2, field_index=0, description=u'not-found: Delete failed. No such queue: cinder-volume (qpid/broker/Broker.cpp:1119)', error_info={}, id=serial(7))
2013-05-27 04:10:23 DEBUG [qpid.messaging.io.ops] RCVD[37e8c68]: SessionDetach(name='419462f4-23a7-4d15-b1db-866aa4ffb380:0')
2013-05-27 04:10:23 DEBUG [qpid.messaging.io.ops] SENT[37e8c68]: SessionDetached(name='419462f4-23a7-4d15-b1db-866aa4ffb380:0')
2013-05-27 04:10:23 DEBUG [qpid.messaging.io.ops] RCVD[37e8c68]: ConnectionCloseOk()
2013-05-27 04:10:23 INFO [cinder.service] Child 14594 exited with status 1

renminmin (rmm0811) said : #5

I try to stop cinder-scheduler
service openstack-cinder-scheduler stop
The end of log print error info, is this related to the last problem?

2013-05-27 04:18:46 AUDIT [cinder.service] SIGTERM received
2013-05-27 04:18:46 DEBUG [qpid.messaging.io.ops] SENT[2e70c20]: SessionDetach(name='741d7405-dcdb-4c9c-8fdc-b1e45f519150:5')
2013-05-27 04:18:46 DEBUG [qpid.messaging.io.ops] SENT[2e70c20]: ConnectionClose(reply_code=200)
2013-05-27 04:18:46 DEBUG [qpid.messaging.io.raw] SENT[2e70c20]: '\x0f\x00\x008\x00\x00\x00\x00\x00\x00\x00\x00\x02\x03\x01\x00\x00&741d7405-dcdb-4c9c-8fdc-b1e45f519150:5\x0f\x00\x00\x12\x00\x00\x00\x00\x00\x00\x00\x00\x01\x0b\x01\x00\x00\xc8'
2013-05-27 04:18:46 DEBUG [qpid.messaging.io.raw] READ[2e70c20]: '\x0f\x00\x009\x00\x00\x00\x00\x00\x00\x00\x00\x02\x04\x03\x00\x00&741d7405-dcdb-4c9c-8fdc-b1e45f519150:5\x00\x0f\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x01\x0c\x00\x00'
2013-05-27 04:18:46 DEBUG [qpid.messaging.io.ops] RCVD[2e70c20]: SessionDetached(name='741d7405-dcdb-4c9c-8fdc-b1e45f519150:5', code=0)
2013-05-27 04:18:46 DEBUG [qpid.messaging.io.ops] RCVD[2e70c20]: ConnectionCloseOk()
2013-05-27 04:18:46 CRITICAL [cinder] need more than 0 values to unpack
Traceback (most recent call last):
  File "/usr/bin/cinder-scheduler", line 50, in <module>
    service.wait()
  File "/usr/lib/python2.6/site-packages/cinder/service.py", line 613, in wait
    rpc.cleanup()
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/__init__.py", line 240, in cleanup
    return _get_impl().cleanup()
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py", line 649, in cleanup
    return rpc_amqp.cleanup(Connection.pool)
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py", line 671, in cleanup
    connection_pool.empty()
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py", line 80, in empty
    self.get().close()
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py", line 386, in close
    self.connection.close()
  File "<string>", line 6, in close
  File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 316, in close
    ssn.close(timeout=timeout)
  File "<string>", line 6, in close
  File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 749, in close
    if not self._ewait(lambda: self.closed, timeout=timeout):
  File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 566, in _ewait
    result = self.connection._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 208, in _ewait
    result = self._wait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 193, in _wait
    return self._waiter.wait(predicate, timeout=timeout)
  File "/usr/lib/python2.6/site-packages/qpid/concurrency.py", line 57, in wait
    self.condition.wait(3)
  File "/usr/lib/python2.6/site-packages/qpid/concurrency.py", line 96, in wait
    sw.wait(timeout)
  File "/usr/lib/python2.6/site-packages/qpid/compat.py", line 53, in wait
    ready, _, _ = select([self], [], [], timeout)
ValueError: need more than 0 values to unpack

renminmin (rmm0811) said : #6

node1: service openstack-cinder-volume stop
dev202: tail -f /var/log/cinder/volume.log
Receive the message "Queue cinder-volume has been deleted"

2013-05-27 04:59:14 DEBUG [qpid.messaging.io.raw] READ[2900758]: '\x0f\x01\x00g\x00\x01\x00\x00\x00\x00\x00\x00\x03\x03\x01\x00\x7f\x00\x01\x98\x00\x00\x00\x00\x00\x00\x00\x00BQueue cinder-volume has been deleted. (qpid/broker/Queue.cpp:1787)\x00\x00\x00\x04\x00\x00\x00\x00\x0f\x00\x008\x00\x00\x00\x00\x00\x00\x00\x00\x02\x03\x01\x00\x00&338970db-7a5a-4993-b5c0-493b4fa2ce80:0'
2013-05-27 04:59:14 DEBUG [qpid.messaging.io.ops] RCVD[2900758]: ExecutionException(error_code=408, command_id=serial(0), class_code=0, command_code=0, field_index=0, description=u'Queue cinder-volume has been deleted. (qpid/broker/Queue.cpp:1787)', error_info={}, id=serial(7))
2013-05-27 04:59:14 DEBUG [qpid.messaging.io.ops] RCVD[2900758]: SessionDetach(name='338970db-7a5a-4993-b5c0-493b4fa2ce80:0')
2013-05-27 04:59:14 DEBUG [qpid.messaging.io.ops] SENT[2900758]: SessionDetached(name='338970db-7a5a-4993-b5c0-493b4fa2ce80:0')
2013-05-27 04:59:14 DEBUG [qpid.messaging.io.raw] SENT[2900758]: '\x0f\x00\x008\x00\x00\x00\x00\x00\x00\x00\x00\x02\x04\x01\x00\x00&338970db-7a5a-4993-b5c0-493b4fa2ce80:0'
2013-05-27 04:59:19 DEBUG [cinder.manager] Running periodic task VolumeManager._publish_service_capabilities
2013-05-27 04:59:19 DEBUG [cinder.manager] Notifying Schedulers of capabilities ...

renminmin (rmm0811) said : #7

qpid-tool watch queue active

when node1 and dev202 cinder-volume service start ,
list queue active:

 1962 09:58:23 - 301.cinder-scheduler
    1963 09:58:23 - 301.cinder-scheduler.node1
    1964 09:58:23 - 301.cinder-scheduler_fanout_37e2047fa48b46c2a08867b7e9d00b2a
    2444 10:04:22 - 301.cinder-volume
    2445 10:04:22 - 301.cinder-volume.dev202
    2446 10:04:22 - 301.cinder-volume_fanout_fac001a142eb411a9cc1473e51c86626
    2466 10:04:32 - 301.cinder-volume.node1
    2467 10:04:32 - 301.cinder-volume_fanout_bdfd1086647d4bb68859efebf01d77f7

However, node1 or dev202 execute cmd: service openstack-cinder-volume stop

list queue active:

    1962 09:58:23 - 301.cinder-scheduler
    1963 09:58:23 - 301.cinder-scheduler.node1
    1964 09:58:23 - 301.cinder-scheduler_fanout_37e2047fa48b46c2a08867b7e9d00b2a

all queue cinder-volume were removed.

renminmin (rmm0811) said : #8

I think I found some problems of qpid as rpcbackend, however I'm not sure about it. Could anyone try to test it with your environment?

openstack grizzly version

config file need debug=True

1. service openstack-cinder-scheduler stop (nova-compute, nova-scheduler, etc)
2. " vi /var/log/cinder/scheduler.log " some info will be found like this.

2013-05-27 06:02:46 CRITICAL [cinder] need more than 0 values to unpack
Traceback (most recent call last):
  File "/usr/bin/cinder-scheduler", line 50, in <module>
    service.wait()
  File "/usr/lib/python2.6/site-packages/cinder/service.py", line 613, in wait
    rpc.cleanup()
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/__init__.py", line 240, in cleanup
    return _get_impl().cleanup()
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py", line 649, in cleanup
    return rpc_amqp.cleanup(Connection.pool)
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py", line 671, in cleanup
    connection_pool.empty()
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py", line 80, in empty
    self.get().close()
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py", line 386, in close
    self.connection.close()
  File "<string>", line 6, in close
  File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 316, in close
    ssn.close(timeout=timeout)
  File "<string>", line 6, in close
  File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 749, in close
    if not self._ewait(lambda: self.closed, timeout=timeout):
  File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 566, in _ewait
    result = self.connection._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 208, in _ewait
    result = self._wait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 193, in _wait
    return self._waiter.wait(predicate, timeout=timeout)
  File "/usr/lib/python2.6/site-packages/qpid/concurrency.py", line 57, in wait
    self.condition.wait(3)
  File "/usr/lib/python2.6/site-packages/qpid/concurrency.py", line 96, in wait
    sw.wait(timeout)
  File "/usr/lib/python2.6/site-packages/qpid/compat.py", line 53, in wait
    ready, _, _ = select([self], [], [], timeout)
ValueError: need more than 0 values to unpack

renminmin (rmm0811) said : #9

Because I encountered this problems, however others services except cinder-volume never appear this problems.
Then I found other services log print some critical info, error at self.connection.close()
So I delete self.connection.close() which should not be removed, I watch qpid queue infomation, the problem which I confused on multi-cinder-volumes disappear.
In other words, not all cinder-volume queue be removed, just node1 cinder-volume.node1 and cinder-volume_fanout_bdfd1086647d4bb68859efebf01d77f7

I think the problem may be a bug.

renminmin (rmm0811) said : #10

There are differents between /usr/bin/cinder-volume with others(cinder-scheduler/ nova-compute/nova-scheduler etc).
/usr/bin/cinder-volume service start method:
           launcher = service.ProcessLauncher()
          server = service.Service.create(binary='cinder-volume')
         launcher.launch_server(server)
         launcher.wait()

Others method:
            server = service.Service.create(binary='nova-compute',
                                    topic=CONF.compute_topic,
                                    db_allowed=False)
          service.serve(server)
          service.wait()

Then I changed /usr/bin/cinder-volume to service.wait() method as same as others, problem confused me disappear.
when stop cinder-volume service, the critical info appear

The different of two methods is whether fork child process or not.

Is it the reason of problem confused me ?

Could anyone help me?

Michael Basnight (hubcap) said : #11

The processlauncher is to spin up > 1 (or 1) child processes and control them. If you were running a multi backend and had 4 backends you would have 4 processes spun up by the process launcher, and when you exited the cinder-volume it would close them all. te process launcher code does not do anything w/ the queues though, thats in the Service code.

renminmin (rmm0811) said : #12

Hi Michael, thanks for your replay.
I think it is a bug of qpid as rpcbackend.

Other service(nova-compute, cinder-scheduler, etc) use eventlet thead to run service. They stop service use thread kill() method. The last step rpc.cleanup() just did nothing, because the relative consume connection run in thread and killed. I think it is unnecessary. All queue is auto-delete, they will be removed when all receiver disappear.

However, cinder-volume use process to run service, so stop service need to close connection and receiver (consumer) of the session of connection need to close when call connection.close().
receiver close will sent MessageCancel and QueueDelete message to broker(qpid server), so that all cinder-volume queue be removed.

I think that the reason of problem confused me.

But I don't know how to solve it.

renminmin (rmm0811) said : #13

I am not sure that set session.receivers and session.senders to null list directly to solve the problem.

renminmin (rmm0811) said : #14

Add try .... except .... change to line 386 only solve cinder-scheduler or nova-compute service which is the similar implementation stop raise exception.
However, all cinder-volume queue be removed when one of multi-cinder-volume service stop. It is another problem.

I use pdb module to trace two different sevice stop(cinder-scheduler and cinder-volume).
Let me describe two different implemention stop service.

cinder-scheduler catch the signal to stop will to call _launcher.stop() cinder/service.py line 612
_launcher.stop() will kill all service thread which run service.start and service.wait .
After thread killed, I found that connection.session.recievers is [], that means all consumer released. I'm not sure connection closed or not.
I found that the method kill() of class service not be called.

cinder-volume launch two processes, service run in child process (service.py line 227) and parent process watch the status of child.
When parent process catch to stop signal, it send the stop signal to child process.
child process will catch signal and call service.stop (service.py line 239)

And I use pdb to trace stop steps. I found that connection.session.receivers is not [] and including three receivers(cinder-volume, cinder-volume.node1, cinder-volume_fanout)
qpid will remove receivers of session, then MessageCancel and QueueDelete will set to qpidd.
I think QueueDelete told the qpidd to delete all cinder-volume queues.

Yang Yu (yuyangbj) said : #15

Hi renminmin,

Can you check your UTC date on your env? Maybe the gap between your UTC time on your scheduler node and on your volume node is larger than the default value 60. So try to modify the property service_down_time in your cinder.conf to 180 or larger. Good luck!

renminmin (rmm0811) said : #16

Thanks for your reply.
I am sure about that I use ntpd to sync the UTC date.

+---------------------+---------------------+------------+---------+----+---------------------+------------------+------------------+--------------+----------+-------------------+
| created_at | updated_at | deleted_at | deleted | id | host | binary | topic | report_count | disabled | availability_zone |
+---------------------+---------------------+------------+---------+----+---------------------+------------------+------------------+--------------+----------+-------------------+
| 2013-06-03 02:07:21 | 2013-06-03 02:14:33 | NULL | 0 | 1 | node1 | cinder-scheduler | cinder-scheduler | 43 | 0 | test:dev188 |
| 2013-06-03 02:07:24 | 2013-06-03 02:14:34 | NULL | 0 | 2 | node1 | cinder-volume | cinder-volume | 25 | 0 | test:dev188 |
| 2013-06-03 02:07:39 | 2013-06-03 02:14:28 | NULL | 0 | 3 | dev202@driver_2 | cinder-volume | cinder-volume | 40 | 0 | test:dev202 |
| 2013-06-03 02:07:39 | 2013-06-03 02:14:28 | NULL | 0 | 4 | dev202@driver_3 | cinder-volume | cinder-volume | 40 | 0 | test:dev202 |
| 2013-06-03 02:07:39 | 2013-06-03 02:14:28 | NULL | 0 | 5 | dev202@driver_1 | cinder-volume | cinder-volume | 40 | 0 | test:dev202 |
+---------------------+---------------------+------------+---------+----+---------------------+------------------+------------------+--------------+----------+-------------------+
5 rows in set (0.00 sec)

I think #14 my description about the reason of problems is right.

renminmin (rmm0811) said : #18

I think this is a bug of qpid python client, and report it to qpid projects
https://issues.apache.org/jira/browse/QPID-4903