closing AMQP connection <0.7717.4> (11.10.37.18:41437 -> 11.10.37.17:5672): missed heartbeats from client, timeout: 60s

Asked by kiiller

Hello all,
I hava a problem with nova (Mitaka),the situation is that in my MQ log ,the compute node can not connect the controller node all the time.For the moment the connection is OK, for the another moment the connection is missing,the MQ log is:
the controller node 11.10.37.17
the compute node 11.10.37.18
=ERROR REPORT==== 23-Mar-2017::14:38:03 ===
closing AMQP connection <0.30872.3> (11.10.37.18:41120 -> 11.10.37.17:5672):
missed heartbeats from client, timeout: 60s

=ERROR REPORT==== 23-Mar-2017::14:38:04 ===
closing AMQP connection <0.30903.3> (11.10.37.18:41123 -> 11.10.37.17:5672):
missed heartbeats from client, timeout: 60s

=ERROR REPORT==== 23-Mar-2017::14:39:03 ===
closing AMQP connection <0.30875.3> (11.10.37.18:41121 -> 11.10.37.17:5672):
missed heartbeats from client, timeout: 60s

=ERROR REPORT==== 23-Mar-2017::14:39:04 ===
closing AMQP connection <0.30900.3> (11.10.37.18:41122 -> 11.10.37.17:5672):
missed heartbeats from client, timeout: 60s

=INFO REPORT==== 23-Mar-2017::14:41:04 ===
accepting AMQP connection <0.1307.4> (11.10.37.18:41227 -> 11.10.37.17:5672)

=INFO REPORT==== 23-Mar-2017::14:41:04 ===
accepting AMQP connection <0.1304.4> (11.10.37.18:41226 -> 11.10.37.17:5672)

=INFO REPORT==== 23-Mar-2017::14:41:05 ===
accepting AMQP connection <0.1335.4> (11.10.37.18:41228 -> 11.10.37.17:5672)

=INFO REPORT==== 23-Mar-2017::14:41:05 ===
accepting AMQP connection <0.1338.4> (11.10.37.18:41229 -> 11.10.37.17:5672)

=ERROR REPORT==== 23-Mar-2017::14:44:04 ===
closing AMQP connection <0.1304.4> (11.10.37.18:41226 -> 11.10.37.17:5672):
missed heartbeats from client, timeout: 60s

=ERROR REPORT==== 23-Mar-2017::14:44:05 ===
closing AMQP connection <0.1338.4> (11.10.37.18:41229 -> 11.10.37.17:5672):
missed heartbeats from client, timeout: 60s

=ERROR REPORT==== 23-Mar-2017::14:45:04 ===
closing AMQP connection <0.1307.4> (11.10.37.18:41227 -> 11.10.37.17:5672):
missed heartbeats from client, timeout: 60s

=ERROR REPORT==== 23-Mar-2017::14:45:05 ===
closing AMQP connection <0.1335.4> (11.10.37.18:41228 -> 11.10.37.17:5672):
missed heartbeats from client, timeout: 60s

=INFO REPORT==== 23-Mar-2017::14:47:05 ===
accepting AMQP connection <0.4515.4> (11.10.37.18:41331 -> 11.10.37.17:5672)

=INFO REPORT==== 23-Mar-2017::14:47:05 ===
accepting AMQP connection <0.4512.4> (11.10.37.18:41330 -> 11.10.37.17:5672)

=INFO REPORT==== 23-Mar-2017::14:47:06 ===
accepting AMQP connection <0.4541.4> (11.10.37.18:41332 -> 11.10.37.17:5672)

=INFO REPORT==== 23-Mar-2017::14:47:06 ===
accepting AMQP connection <0.4544.4> (11.10.37.18:41333 -> 11.10.37.17:5672)

=ERROR REPORT==== 23-Mar-2017::14:50:05 ===
closing AMQP connection <0.4515.4> (11.10.37.18:41331 -> 11.10.37.17:5672):
missed heartbeats from client, timeout: 60s

=ERROR REPORT==== 23-Mar-2017::14:50:06 ===
closing AMQP connection <0.4544.4> (11.10.37.18:41333 -> 11.10.37.17:5672):
missed heartbeats from client, timeout: 60s

=ERROR REPORT==== 23-Mar-2017::14:51:05 ===
closing AMQP connection <0.4512.4> (11.10.37.18:41330 -> 11.10.37.17:5672):
missed heartbeats from client, timeout: 60s

=ERROR REPORT==== 23-Mar-2017::14:51:06 ===
closing AMQP connection <0.4541.4> (11.10.37.18:41332 -> 11.10.37.17:5672):
missed heartbeats from client, timeout: 60s

=INFO REPORT==== 23-Mar-2017::14:53:07 ===
accepting AMQP connection <0.7717.4> (11.10.37.18:41437 -> 11.10.37.17:5672)

=INFO REPORT==== 23-Mar-2017::14:53:07 ===
accepting AMQP connection <0.7714.4> (11.10.37.18:41436 -> 11.10.37.17:5672)

=INFO REPORT==== 23-Mar-2017::14:53:08 ===
accepting AMQP connection <0.7744.4> (11.10.37.18:41438 -> 11.10.37.17:5672)

=INFO REPORT==== 23-Mar-2017::14:53:08 ===
accepting AMQP connection <0.7747.4> (11.10.37.18:41439 -> 11.10.37.17:5672)

=ERROR REPORT==== 23-Mar-2017::14:56:07 ===
closing AMQP connection <0.7717.4> (11.10.37.18:41437 -> 11.10.37.17:5672):
missed heartbeats from client, timeout: 60s

=ERROR REPORT==== 23-Mar-2017::14:56:08 ===
closing AMQP connection <0.7744.4> (11.10.37.18:41438 -> 11.10.37.17:5672):
missed heartbeats from client, timeout: 60s
#####################################################
I'm sure that the nova.conf and the openstack-nova-compute.service is ok.
when the MQ connection is ok ,in the nova service-list ,the compute node is up .while then MQ connection is missing in the nova service-list the compute node is down,however in the nova-computer.log you can see :
###########################################################
2017-03-23 15:11:12.017 9788 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 11.10.37.17:5672 via [amqp] client
2017-03-23 15:11:12.050 9788 DEBUG nova.compute.manager [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] There are 0 instances to clean _run_pending_deletes /usr/lib/python2.7/site-packages/nova/compute/manager.py:6757
2017-03-23 15:11:12.050 9788 DEBUG oslo_service.periodic_task [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Running periodic task ComputeManager._check_instance_build_time run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2017-03-23 15:11:12.051 9788 DEBUG oslo_service.periodic_task [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Running periodic task ComputeManager._heal_instance_info_cache run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2017-03-23 15:11:12.052 9788 DEBUG nova.compute.manager [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Starting heal instance info cache _heal_instance_info_cache /usr/lib/python2.7/site-packages/nova/compute/manager.py:5741
2017-03-23 15:11:12.052 9788 DEBUG nova.compute.manager [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Rebuilding the list of instances to heal _heal_instance_info_cache /usr/lib/python2.7/site-packages/nova/compute/manager.py:5745
2017-03-23 15:11:12.089 9788 DEBUG nova.compute.manager [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Didn't find any instances for network info cache update. _heal_instance_info_cache /usr/lib/python2.7/site-packages/nova/compute/manager.py:5817
2017-03-23 15:11:12.090 9788 DEBUG oslo_service.periodic_task [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Running periodic task ComputeManager._cleanup_incomplete_migrations run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2017-03-23 15:11:12.090 9788 DEBUG nova.compute.manager [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Cleaning up deleted instances with incomplete migration _cleanup_incomplete_migrations /usr/lib/python2.7/site-packages/nova/compute/manager.py:6783
2017-03-23 15:12:12.110 9788 DEBUG oslo_service.periodic_task [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Running periodic task ComputeManager._instance_usage_audit run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2017-03-23 15:12:12.130 9788 DEBUG oslo_service.periodic_task [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Running periodic task ComputeManager._poll_rebooting_instances run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2017-03-23 15:12:12.131 9788 DEBUG oslo_service.periodic_task [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Running periodic task ComputeManager._poll_volume_usage run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2017-03-23 15:12:12.131 9788 DEBUG oslo_service.periodic_task [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Running periodic task ComputeManager._reclaim_queued_deletes run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2017-03-23 15:12:12.132 9788 DEBUG nova.compute.manager [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] CONF.reclaim_instance_interval <= 0, skipping... _reclaim_queued_deletes /usr/lib/python2.7/site-packages/nova/compute/manager.py:6413
2017-03-23 15:12:12.132 9788 DEBUG oslo_service.periodic_task [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Running periodic task ComputeManager.update_available_resource run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2017-03-23 15:12:12.171 9788 INFO nova.compute.resource_tracker [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Auditing locally available compute resources for node computer1
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Error updating resources for node computer1.
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager Traceback (most recent call last):
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6460, in update_available_resource
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager rt.update_available_resource(context)
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 500, in update_available_resource
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager resources = self.driver.get_available_resource(self.nodename)
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5364, in get_available_resource
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager disk_info_dict = self._get_local_gb_info()
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5007, in _get_local_gb_info
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager info = LibvirtDriver._get_rbd_driver().get_pool_info()
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py", line 365, in get_pool_info
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager with RADOSClient(self) as client:
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py", line 92, in __init__
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager self.cluster, self.ioctx = driver._connect_to_rados(pool)
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py", line 123, in _connect_to_rados
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager client.connect()
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager File "rados.pyx", line 785, in rados.Rados.connect (rados.c:8969)
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager TimedOut: error connecting to the cluster
2017-03-23 15:17:12.205 9788 ERROR nova.compute.manager
2017-03-23 15:17:12.208 9788 DEBUG oslo.messaging._drivers.impl_rabbit [-] Received recoverable error from kombu: on_error /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:674
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 436, in _ensured
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit return fun(*args, **kwargs)
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 508, in __call__
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit return fun(*args, channel=channels[0], **kwargs), channels[0]
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 723, in execute_method
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit method()
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 986, in _consume
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit self.connection.drain_events(timeout=poll_timeout)
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 275, in drain_events
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit return self.transport.drain_events(self.connection, **kwargs)
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 95, in drain_events
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit return connection.drain_events(**kwargs)
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 302, in drain_events
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit chanmap, None, timeout=timeout,
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 365, in _wait_multiple
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit channel, method_sig, args, content = read_timeout(timeout)
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 336, in read_timeout
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit return self.method_reader.read_method()
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 189, in read_method
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit raise m
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit IOError: Socket closed
2017-03-23 15:17:12.208 9788 ERROR oslo.messaging._drivers.impl_rabbit
2017-03-23 15:17:12.209 9788 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 11.10.37.17:5672 closed the connection. Check login credentials: Socket closed
2017-03-23 15:17:12.210 9788 DEBUG oslo.messaging._drivers.impl_rabbit [-] Received recoverable error from kombu: on_error /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:674
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 436, in _ensured
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit return fun(*args, **kwargs)
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 508, in __call__
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit return fun(*args, channel=channels[0], **kwargs), channels[0]
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 723, in execute_method
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit method()
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 986, in _consume
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit self.connection.drain_events(timeout=poll_timeout)
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 275, in drain_events
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit return self.transport.drain_events(self.connection, **kwargs)
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 95, in drain_events
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit return connection.drain_events(**kwargs)
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 302, in drain_events
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit chanmap, None, timeout=timeout,
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 365, in _wait_multiple
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit channel, method_sig, args, content = read_timeout(timeout)
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 336, in read_timeout
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit return self.method_reader.read_method()
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 189, in read_method
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit raise m
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit IOError: Socket closed
2017-03-23 15:17:12.210 9788 ERROR oslo.messaging._drivers.impl_rabbit
2017-03-23 15:17:12.211 9788 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 11.10.37.17:5672 closed the connection. Check login credentials: Socket closed
2017-03-23 15:17:12.212 9788 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2017-03-23 15:17:12.226 9788 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Socket closed
2017-03-23 15:17:12.238 9788 DEBUG oslo_service.periodic_task [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Running periodic task ComputeManager._poll_unconfirmed_resizes run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2017-03-23 15:17:12.239 9788 DEBUG oslo_service.periodic_task [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Running periodic task ComputeManager._sync_power_states run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2017-03-23 15:17:13.243 9788 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 11.10.37.17:5672 via [amqp] client
2017-03-23 15:17:13.250 9788 WARNING nova.compute.manager [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] While synchronizing instance power states, found 0 instances in the database and 3 instances on the hypervisor.
2017-03-23 15:17:13.251 9788 DEBUG oslo_service.periodic_task [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Running periodic task ComputeManager._sync_scheduler_instance_info run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2017-03-23 15:17:13.253 9788 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 11.10.37.17:5672 via [amqp] client
2017-03-23 15:17:13.299 9788 DEBUG oslo_service.periodic_task [req-942532b7-de95-4997-a794-21ebe1119f3a - - - - -] Running periodic task ComputeManager._poll_rescued_instances run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
#############################################################
I also create a new compute node with the same nova.conf profile it work well .I don't know what wrong with it .who can help me ?

Question information

Language:
English Edit question
Status:
Expired
For:
Ubuntu nova Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Launchpad Janitor (janitor) said :
#1

This question was expired because it remained in the 'Open' state without activity for the last 15 days.