heartbeats for compute nodes?

Asked by fred yang

nova-compute creates "nova-compute" into CC DB.service table and new entry into DB.services table. nova-compute then periodically update DB.services.updated_at and report_count fields.

Any way for CC to know a compute node is restarted or the nova-compute service is restarted? any event notification method than continually polling nova DB in monitoring update_at & report_count?

Thanks
-Fred

Question information

Language:
English Edit question
Status:
Solved
For:
OpenStack Compute (nova) Edit question
Assignee:
No assignee Edit question
Solved by:
fred yang
Solved:
Last query:
Last reply:
Revision history for this message
Sandy Walsh (sandy-walsh) said :
#1

Look at nova.manager.SchedulerDependentManager

Any service that is alive (and has Capabilities defined) will report to the ZoneManager in the Schedulers when they come up. If you look at nova.scheduler.api, you'll see how you can query the schedulers to get status.

This is all done via AMQP and does not require DB updates/changes.

Revision history for this message
Everett Toews (everett-toews) said :
#2

I use

nova-manage service list | sort

and when I want to monitor it

watch -n 2 "nova-manage service list | sort"

Everett

On Thu, Jun 9, 2011 at 2:21 PM, fred yang <
<email address hidden>> wrote:

> New question #160895 on OpenStack Compute (nova):
> https://answers.launchpad.net/nova/+question/160895
>
> nova-compute creates "nova-compute" into CC DB.service table and new entry
> into DB.services table. nova-compute then periodically update
> DB.services.updated_at and report_count fields.
>
> Any way for CC to know a compute node is restarted or the nova-compute
> service is restarted? any event notification method than continually
> polling nova DB in monitoring update_at & report_count?
>
> Thanks
> -Fred
>
> --
> You received this question notification because you are an answer
> contact for OpenStack Compute (nova).
>

Revision history for this message
Sandy Walsh (sandy-walsh) said :
#3

Hmm, that's interesting. I should add a hook in manage to dump the ZoneManager info. That would really be useful.

Revision history for this message
fred yang (fred-yang) said :
#4

Ok, it seems the polling is the way..
But I don't see service create time got reset when a compute node rebooted - I guess the logic is only to do compute_node_update() if it finds its service entry already.
Any possibility to identify a compute node has been rebooted since a predefined time? the "watch" mechanism can only identify a new online node which doesn't have service entry yet
I would like to build host trustiness state and would like to rebuild the information when a node got rebooted or got on-lined
Thanks,
-Fred

Revision history for this message
Sandy Walsh (sandy-walsh) said :
#5

You'd have to put a special query in the scheduler.driver/zonemanager for that ... additionally perhaps in nova.manager you may need to add a one-time flag when a service was last booted.

Both would be pretty straightforward (and handy) additions.

Revision history for this message
fred yang (fred-yang) said :
#6

Sandy,

so the checking can be derived from Zonemanager.ping -> scheduler.hosts_up -> service_is_up for the zone

Thanks,
-Fred

Revision history for this message
Sandy Walsh (sandy-walsh) said :
#7

The member variable in ZoneManager you're interested in is:

ZoneManager.service_states = {} # { <host> : { <service> : { cap k : v }}}

Look at nova.scheduler.api.get_zone_capabilities() to see how to call the Scheduler to query the ZoneManager (and the related nova.scheduler.manager.get_zone_capabilities() for the server-side counterpart)

That will give you the means to query for the info.

To add the ability for the Service to add a field about when the service was restarted last look at:

nova.manager.SchedulerDependentManager

I'd put a self.started_datetime (or something) in __init__() and initialize it to the UTC time. Tack it into the self.last_capabilities member variable in update_service_capabilities() and it'll get sent to the Schedulers on every update.

Let me know if that makes sense (I think I'm answering the right question? :)

-S

Revision history for this message
fred yang (fred-yang) said :
#8

You may have addressed 2 issues for me :-) though my usage model may be out of this question's scope

1. Adding started_time, or service_boot_time, as last_capabilities to post to scheduler, which can be used to check if node service got restarted or be used by Host_filter drivers to remove newly booted compute nodes, if needed, during zone_aware_scheduler.

2. scheduler.api.get_zone_capabilities() query derivation can be applied to build hosts trusted database though a service daemon, with refreshing host trust states when a host rebooted - This trust computing pool RFC will be posted to OpenStack mailing list for comment soon

Thanks,
-Fred

Revision history for this message
Sandy Walsh (sandy-walsh) said :
#9

1. Yup, sounds correct.

2. I would make a new nova.scheduler.api method for this call rather than adjusting get_zone_capabilities, but you have the right idea.

Look forward to seeing the ML RFC!

Revision history for this message
fred yang (fred-yang) said :
#10

Thanks Sandy Walsh, that solved my question.

Revision history for this message
Sandy Walsh (sandy-walsh) said :
#11

np Fred, happy to help ... don't hesitate to ping me if you run into issues.

Revision history for this message
fred yang (fred-yang) said :
#12

Sandy,

Continue ZoneManager.service_states[] locking question on this same thread.

ZoneManager.update_service_capabilities() is updating service_states per compute_node periodic_tasks, where JsonFilter.filter_hosts() is also looping through service_states[] to filter each hosts. There is no data locking while accessing service_states[] from both controls, is it implicitly serialized through AMQP by nova-scheduler executing zoneManger and filter_hosts. Am I read it correct?!

nova-scheduler also schedules SchedulerManager.ping() periodically through greenthread. If we derive ping() from SchedulerManager() to update service_states[] periodically on the same scheduler node, any data locking needed or what will be the better locking method?

Thanks,
-Fred

Revision history for this message
Sandy Walsh (sandy-walsh) said :
#13

Hey Fred,

Correct, it's my understanding that since we're using Eventlet (a Reactor pattern), we shouldn't run into concurrency issues since the service is essentially single threaded. GreenPool is an Eventlet-aware mechanism, so it should be safe too.

Downside is it won't take advantage of multi-core/processors, but we can fire up more than one service.

-S

Revision history for this message
fred yang (fred-yang) said :
#14

GreenThread is a cooperative yield model according to Eventlet doc, so it should be safe. Thanks for sharing the light.