[MultiZones] NoValidHost: No hosts were available

Asked by Édouard Thuleau

Hi,
I tried to set Zone with nova rev 1244.

I set parent zone named 'nova' and a child zone named 'nova1'.
The zone 'nova' doesn't have compute resources and the zone 'nova1' has 2 compute hosts.

Config flags :
--allow_admin_api=True
--enable_zone_routing=true
--zone_name=nova
--build_plan_encryption_key=c286696d887c9aa0611bbba32025a450
--scheduler_driver=nova.scheduler.host_filter.HostFilterScheduler
--default_host_filter=nova.scheduler.host_filter.AllHostsFilter

I set the child zone with this command on the parent OSAPI:
$ nova zone-add http://10.193.175.142:8774/v1.0/ admin admin 0.1 0.9
+---------------+----------------------------------+
| Property | Value |
+---------------+----------------------------------+
| api_url | http://10.193.175.142:8774/v1.0/ |
| id | 1 |
| weight_offset | 0.1 |
| weight_scale | 0.9 |
+---------------+----------------------------------+
$ nova zone-list
+----+------+-----------+----------------------------------+---------------+--------------+
| ID | Name | Is Active | API URL | Weight Offset | Weight Scale |
+----+------+-----------+----------------------------------+---------------+--------------+
| 1 | n/a | n/a | http://10.193.175.142:8774/v1.0/ | 0.1 | 0.9 |
+----+------+-----------+----------------------------------+---------------+--------------+

When I try to start an instance, the distributed scheduler cannot find host available.

parent scheduler log :
2011-07-06 19:21:33,448 DEBUG nova.rpc [-] received {u'_context_request_id': u'-5N6RP1OWHAA0HE-JXR7', u'_context_read_deleted': False, u'args': {u'topic': u'compute', u'request_spec': {u'instance_properties': {u'state_description': u'scheduling', u'availability_zone': None, u'ramdisk_id': u'', u'instance_type_id': 5, u'user_data': u'', u'vm_mode': None, u'reservation_id': u'r-gxyzgouv', u'user_id': u'admin', u'dis
play_description': None, u'key_data': u'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCWOCOz8ghwivuxLKllRAFuyO54bVMKJ/n98hYvvbYhqlUFoMu5ac6HFKENZms
/85tD2FS7rN2iEhB7mkxnbX/8Qhh6d+24YEHVNCmJBfqyztLRu86fpup3IziT2yv2TyLZtnl8lk5MmFzgcxa3vifwGLglJuTQ0x/BDbFvHJZuJw== nova@p-novamaster\n', u'state': 0, u'project_id': u'simple', u'metadata': {}, u'kernel_id': u'', u'key_name': u'key', u'display_name': None, u'local_gb': 20, u'lock
ed': False, u'launch_time': u'2011-07-06T17:21:33Z', u'memory_mb': 2048, u'vcpus': 1, u'image_ref': 4, u'architecture': None, u'os_type': N
one}, u'instance_type': {u'rxtx_quota': 0, u'flavorid': 2, u'deleted_at': None, u'name': u'm1.small', u'deleted': False, u'created_at': None, u'updated_at': None, u'memory_mb': 2048, u'vcpus': 1, u'rxtx_cap': 0, u'extra_specs': {}, u'swap': 0, u'local_gb': 20, u'id': 5}, u'num_instances': 1, u'filter': u'nova.scheduler.host_filter.InstanceTypeFilter', u'blob': None}, u'availability_zone': None, u'instance_id': 2,
u'admin_password': None, u'injected_files': None}, u'_context_is_admin': True, u'_context_timestamp': u'2011-07-06T17:21:33Z', u'_context_u
ser': u'admin', u'method': u'run_instance', u'_context_project': u'simple', u'_context_remote_address': u'10.193.118.30'} from (pid=6096) p
rocess_data /usr/lib/pymodules/python2.7/nova/rpc.py:202
2011-07-06 19:21:33,448 DEBUG nova.rpc [-] unpacked context: {'timestamp': u'2011-07-06T17:21:33Z', 'msg_id': None, 'remote_address': u'10.
193.118.30', 'project': u'simple', 'is_admin': True, 'user': u'admin', 'request_id': u'-5N6RP1OWHAA0HE-JXR7', 'read_deleted': False} from (
pid=6096) _unpack_context /usr/lib/pymodules/python2.7/nova/rpc.py:451
2011-07-06 19:21:33,449 DEBUG nova.scheduler.zone_aware_scheduler [-] Attempting to build 1 instance(s) from (pid=6096) schedule_run_instan
ce /usr/lib/pymodules/python2.7/nova/scheduler/zone_aware_scheduler.py:222
2011-07-06 19:21:33,450 WARNING nova.scheduler.zone_aware_scheduler [-] Filter returned no hosts after processing 0 of 1 instances
2011-07-06 19:21:33,467 DEBUG novaclient.client [-] REQ: curl -i http://10.193.175.142:8774/v1.0/ -X GET -H "X-Auth-Key: admin" -H "X-Auth-
User: admin" -H "User-Agent: python-novaclient/2.4"
 from (pid=6096) http_log /usr/local/lib/python2.7/dist-packages/python_novaclient-2.5.7-py2.7.egg/novaclient/client.py:58
2011-07-06 19:21:33,467 DEBUG novaclient.client [-] RESP:{'status': '204', 'content-length': '0', 'x-auth-token': 'db2a23bd0664fb9f09b1ad4d
6d010ce24b1acac8', 'x-cdn-management-url': '', 'x-server-management-url': 'http://10.193.175.142:8774/v1.0/', 'date': 'Wed, 06 Jul 2011 17:
21:33 GMT', 'x-storage-url': '', 'content-type': 'text/plain; charset=UTF-8'}
 from (pid=6096) http_log /usr/local/lib/python2.7/dist-packages/python_novaclient-2.5.7-py2.7.egg/novaclient/client.py:59
2011-07-06 19:21:33,646 DEBUG novaclient.client [-] REQ: curl -i http://10.193.175.142:8774/v1.0//zones/select -X POST -H "User-Agent: pyth
on-novaclient/2.4" -H "Content-Type: application/json" -H "X-Auth-Token: db2a23bd0664fb9f09b1ad4d6d010ce24b1acac8"
 from (pid=6096) http_log /usr/local/lib/python2.7/dist-packages/python_novaclient-2.5.7-py2.7.egg/novaclient/client.py:58
2011-07-06 19:21:33,647 DEBUG novaclient.client [-] RESP:{'date': 'Wed, 06 Jul 2011 17:21:34 GMT', 'status': '200', 'content-length': '15',
 'content-type': 'application/json'} {"weights": []}
 from (pid=6096) http_log /usr/local/lib/python2.7/dist-packages/python_novaclient-2.5.7-py2.7.egg/novaclient/client.py:59
2011-07-06 19:21:33,647 ERROR nova [-] Exception during message handling
(nova): TRACE: Traceback (most recent call last):
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc.py", line 232, in _process_data
(nova): TRACE: rval = node_func(context=ctxt, **node_args)
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/scheduler/manager.py", line 90, in _schedule
(nova): TRACE: **kwargs)
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/scheduler/zone_aware_scheduler.py", line 227, in schedule_run_instance
(nova): TRACE: raise driver.NoValidHost(_('No hosts were available'))
(nova): TRACE: NoValidHost: No hosts were available
(nova): TRACE:

Child scheduler log:
2011-07-06 19:21:33,432 DEBUG nova [-] Updating zone cache from db. from (pid=18130) ping /usr/lib/pymodules/python2.7/nova/scheduler/zone_manager.py:165
2011-07-06 19:21:33,997 DEBUG nova.rpc [-] received {u'_msg_id': u'4de4397f2758471998c7ffb2b1a9d889', u'_context_read_deleted': False, u'_context_request_id': u'C5NRQS0EKXWT7QTNK5AU', u'args': {u'request_spec': {u'instance_properties': {u'state_description': u'scheduling', u'availability_zone': None, u'ramdisk_id': u'', u'instance_type_id': 5, u'user_data': u'', u'vm_mode': None, u'reservation_id': u'r-gxyzgouv', u'user_id': u'admin', u'display_description': None, u'key_data': u'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCWOCOz8ghwivuxLKllRAFuyO54bVMKJ/n98hYvvbYhqlUFoMu5ac6HFKENZms/85tD2FS7rN2iEhB7mkxnbX/8Qhh6d+24YEHVNCmJBfqyztLRu86fpup3IziT2yv2TyLZtnl8lk5MmFzgcxa3vifwGLglJuTQ0x/BDbFvHJZuJw== nova@p-novamaster\n', u'state': 0, u'project_id': u'simple', u'metadata': {}, u'kernel_id': u'', u'key_name': u'key', u'display_name': None, u'local_gb': 20, u'locked': False, u'launch_time': u'2011-07-06T17:21:33Z', u'memory_mb': 2048, u'vcpus': 1, u'image_ref': 4, u'architecture': None, u'os_type': None}, u'instance_type': {u'rxtx_quota': 0, u'flavorid': 2, u'name': u'm1.small', u'deleted': False, u'created_at': None, u'updated_at': None, u'memory_mb': 2048, u'vcpus': 1, u'rxtx_cap': 0, u'extra_specs': {}, u'swap': 0, u'deleted_at': None, u'id': 5, u'local_gb': 20}, u'num_instances': 1, u'filter': u'nova.scheduler.host_filter.InstanceTypeFilter', u'blob': None}}, u'_context_is_admin': True, u'_context_timestamp': u'2011-07-06T17:21:33Z', u'_context_user': u'admin', u'method': u'select', u'_context_project': u'simple', u'_context_remote_address': None} from (pid=18130) process_data /usr/lib/pymodules/python2.7/nova/rpc.py:202
2011-07-06 19:21:33,998 DEBUG nova.rpc [-] unpacked context: {'timestamp': u'2011-07-06T17:21:33Z', 'msg_id': u'4de4397f2758471998c7ffb2b1a9d889', 'remote_address': None, 'project': u'simple', 'is_admin': True, 'user': u'admin', 'request_id': u'C5NRQS0EKXWT7QTNK5AU', 'read_deleted': False} from (pid=18130) _unpack_context /usr/lib/pymodules/python2.7/nova/rpc.py:451
2011-07-06 19:21:33,998 WARNING nova.scheduler.zone_aware_scheduler [-] Filter returned no hosts after processing 0 of 1 instances

Question information

Language:
English Edit question
Status:
Expired
For:
OpenStack Compute (nova) Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:

This question was reopened

Revision history for this message
Ed Leafe (ed-leafe) said :
#1

How long after starting up the compute service did you attempt to create the instance? Hosts report their status on a regular basis, defined by FLAGS.periodic_interval, which I believe defaults to 60 seconds. What that means is that for the first 60 seconds a host is running, the zone may not know about it.

Try again, but wait at least a minute before attempting to create the instance. It should work fine then; if not, post the error you receive.

Revision history for this message
Édouard Thuleau (ethuleau) said :
#2

Hi Ed,

I waited this time out. I check again and I steel have the problem.
I found something strange. I add some log in the source file '/nova/scheduler/host_filter.py ' to check which default host filter is used by the scheduler and it use the 'InstanceTypeFilter', instead I set flag 'default_host_filter' to 'nova.scheduler.host_filter.AllHostsFilter'. And we can see that in the first line of child or the parent logs 'u'num_instances': 1, u'filter': u'nova.scheduler.host_filter.InstanceTypeFilter', u'blob': None'

I've got other questions about Multizones:

1) How project is distributed between zones ? Is it possible to have some instances of project 1 in zone 1 and other instances of project 1 in zone 2 ?

2) How the VLAN network mode must be set between zones ? One nova-network daemon on each zone ? Same VLAN for each zone ?

Revision history for this message
Ed Leafe (ed-leafe) said :
#3

> I waited this time out. I check again and I steel have the problem. I
> found something strange. I add some log in the source file
> '/nova/scheduler/host_filter.py ' to check which default host filter is
> used by the scheduler and it use the 'InstanceTypeFilter', instead I set
> flag 'default_host_filter' to
> 'nova.scheduler.host_filter.AllHostsFilter'. And we can see that in the
> first line of child or the parent logs 'u'num_instances': 1, u'filter':
> u'nova.scheduler.host_filter.InstanceTypeFilter', u'blob': None'

The InstanceTypeFilter is hard-coded into nova/compute/api.py. Setting the flag has no effect.

If you are still getting the 'no hosts' error, then your compute host does not have the capabilities required by the instance being created. To debug this, go into the 'filter_hosts()' method of nova/scheduler/host_filter.py, and inside the loop beginning with "for host, services in zone_manager.service_states.iteritems():", add logging to output the values of all the values that are created in that loop to determine if the host is capable of creating the instance.

> I've got other questions about Multizones:
>
> 1) How project is distributed between zones ? Is it possible to have
> some instances of project 1 in zone 1 and other instances of project 1
> in zone 2 ?

Right now it is possible, but you have to make sure that the project credentials are duplicated across zones. When Keystone is integrated, that will no longer be necessary.

> 2) How the VLAN network mode must be set between zones ? One
> nova-network daemon on each zone ? Same VLAN for each zone ?

Yes, there must be at least one network service running in each zone. I'm no expert on running VLAN mode, but I would suspect that since each zone can be configured independently, you don't need the VLAN settings to be the same in different zones, but you should ask someone who knows more about networking.

Revision history for this message
Édouard Thuleau (ethuleau) said :
#4

Why it's hard-coded ? It still under development ?

In fact, zone_manager.service_states.iteritems() return an empty dict. So the loop doesn't start.

For my question 2) I think, if a project can be distributed on different zones, the VLAN of the project must be the same in different zone. In this case, the layer 2 is distributed between zone.

Revision history for this message
Ed Leafe (ed-leafe) said :
#5

> Why it's hard-coded ? It still under development ?

I don't know why - maybe Sandy Walsh can explain. But nova/compute/api.py in the _ask_scheduler_to_create_instance() method hard-codes that value.

Revision history for this message
Édouard Thuleau (ethuleau) said :
#6

Do you know why the 'zone_manager.service_states.iteritems()' method returns an empty dict ?

I've got two compute hosts in my zone with enough resources :

$ nova-manage service list
p-novamaster2 nova-vncproxy enabled :-) 2011-07-08 07:24:36
p-novamaster2 nova-volume enabled :-) 2011-07-08 07:24:36
p-novamaster2 nova-scheduler enabled :-) 2011-07-08 07:24:36
p-hs21-16 nova-compute enabled :-) 2011-07-08 07:24:36
p-hs21-17 nova-network enabled :-) 2011-07-08 07:24:36
p-hs21-17 nova-compute enabled :-) 2011-07-08 07:24:36

$ nova-manage service describe_resource p-hs21-16
HOST PROJECT cpu mem(mb) disk(gb)
p-hs21-16(total) 8 32240 36
p-hs21-16(used) 0 804 4

$ nova-manage service describe_resource p-hs21-17
HOST PROJECT cpu mem(mb) disk(gb)
p-hs21-17(total) 8 32240 36
p-hs21-17(used) 0 910 5

Revision history for this message
Édouard Thuleau (ethuleau) said :
#7

In fact, the method 'update_service_capabilities' in zone-manager.py is never call.
So the service_states is never updated and stills empty

Revision history for this message
Ed Leafe (ed-leafe) said :
#8

> In fact, the method 'update_service_capabilities' in zone-manager.py is never call.
> So the service_states is never updated and stills empty

Do you have the 'rabbit_host' flag configured on your various nodes? All nodes in a zone need to have that set to the address of the machine running the rabbitmq-server service. If not, then the messages are not being sent to the proper queue, and would therefore never reach the zone manager.

Revision history for this message
Édouard Thuleau (ethuleau) said :
#9

Yes, I set this flag.
If i configure the zone in a classic mode (mono zone), it works nicely.

I made a test. I had code in a nova-manage command to force a call to 'update_service_capabilities' method:
        rpc.call(ctxt,
                 db.queue_get_for(ctxt, FLAGS.scheduler_topic, 'p-novamaster2'),
                 {"method": "update_service_capabilities"})

With, that, 'service_state' is filling but I get another error:

2011-07-08 14:53:05,303 ERROR nova [-] Exception during message handling
(nova): TRACE: Traceback (most recent call last):
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc.py", line 232, in _process_data
(nova): TRACE: rval = node_func(context=ctxt, **node_args)
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/scheduler/manager.py", line 90, in _schedule
(nova): TRACE: **kwargs)
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/scheduler/zone_aware_scheduler.py", line 225, in schedule_run_instance
(nova): TRACE: build_plan = self.select(context, request_spec)
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/scheduler/zone_aware_scheduler.py", line 248, in select
(nova): TRACE: *args, **kwargs)
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/scheduler/zone_aware_scheduler.py", line 283, in _schedule
(nova): TRACE: host_list = self.filter_hosts(topic, request_spec, host_list)
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/scheduler/host_filter.py", line 348, in filter_hosts
(nova): TRACE: return host_filter.filter_hosts(self.zone_manager, query)
(nova): TRACE: File "/usr/lib/pymodules/python2.7/nova/scheduler/host_filter.py", line 127, in filter_hosts
(nova): TRACE: host_ram_mb = capabilities['host_memory_free']
(nova): TRACE: KeyError: 'host_memory_free'
(nova): TRACE:

Revision history for this message
Sandy Walsh (sandy-walsh) said :
#10

It's hard coded because the only API methods for creating instances are based on InstanceTypes (aka Flavor). There are other groups that want more elaborate means for expressing the type of instance created, they would be the ones to "unstick" this setting.

As for the update_service_capabilities, we first need to ensure the periodic task is being called and that it has something to report. Network and Volume services currently don't have any capabilities to report so we'll never hear from them. It's only Compute so far, and even then it's only with XenServer. If you're using another hypervisor you need to add your own hook to extract the HV stats and report call update_service_capabilities.

Which virt layer are you using?

Revision history for this message
Édouard Thuleau (ethuleau) said :
#11

Hi Sandy,

I use libvirt/KVM hypervisor. So Multi-Cluster is only support with the XenServer hypervisor ?
Libvirt driver reports some host stats, like this :

$ nova-manage service list
p-novamaster2 nova-vncproxy enabled :-) 2011-07-08 07:24:36
p-novamaster2 nova-volume enabled :-) 2011-07-08 07:24:36
p-novamaster2 nova-scheduler enabled :-) 2011-07-08 07:24:36
p-hs21-16 nova-compute enabled :-) 2011-07-08 07:24:36
p-hs21-17 nova-network enabled :-) 2011-07-08 07:24:36
p-hs21-17 nova-compute enabled :-) 2011-07-08 07:24:36

$ nova-manage service describe_resource p-hs21-16
HOST PROJECT cpu mem(mb) disk(gb)
p-hs21-16(total) 8 32240 36
p-hs21-16(used) 0 804 4

It's not enough for the host_filter method ?

Revision history for this message
Sandy Walsh (sandy-walsh) said :
#12

No, multi-cluster works with all HV's.

Look at nova.compute.manager.py periodic_tasks() ... you'll see that the _report_driver_status() method is called every 30 seconds or so. This call delegates to the virt layer to get the capabilities via the get_host_stats() call.

Currently, only the XenServer driver has this method implemented, KVM doesn't. But it should be pretty easy to add (and I'm sure it would be a very welcome addition). We didn't add it since we're a XenServer shop.

Hope it helps!

Revision history for this message
Édouard Thuleau (ethuleau) said :
#13

Ok, thanks for your help.
I don't know If I got enough experience in development to do that, but I will try.

Revision history for this message
Édouard Thuleau (ethuleau) said :
#14

Hi Sandy,

I tried to implement the code for get_host_stats() on the virt layer 'libvirt'.
I check what was done for XenServer and I don't understand something.

The method 'filter_host' in 'InstanceTypeFilter' class compares the capabilities of the host with specified resources of the instance. But the capabilities of the host are in bytes and the specified resources of the instance are in Mbyte (for memory) or Gbyte (for disk). Is it normal ?

Revision history for this message
Sandy Walsh (sandy-walsh) said :
#15

Hey!

Yeah, we haven't really nailed down what the units of measure are for all that stuff yet ... or what the expected fields/names are. I think assume mb for ram and gb for disk for now. I'll file a bug to get it all on the same page.

Thanks for the reminder on that!

Revision history for this message
Launchpad Janitor (janitor) said :
#16

This question was expired because it remained in the 'Open' state without activity for the last 15 days.