High Availability & Cloud Controller

Asked by Chris Godwin on 2011-10-03

I am in the middle of deciding weather or not to use openstack or openQRM and it all depends on how high availability is handled. So my question is as follows.

Does the cloud controller always need to be available? If it does need to be available, how can I accomplish this via heartbeat and drbd? I know how to make a pair of servers in an ha/drbd cluster and to put mysql on top of that, is that, plus nova being installed, all that is required?

If the Cloud controller doesn't need to be available always, what will my windows of repair time look like? Do I have until the next boot or reboot of a vm to fix a broken raid on a controller?

I hope my questions are just irrelevant and I hope there is a way openstack handles this gracefully if there are enough nodes to replicate the data, if thats the case, how many nodes will I need?

I'm trying to run just a general cloud to quick creation & deletion of customers vm plus the ability to host and ec2 style cloud with elastic capabilities.

Question information

Language:
English Edit question
Status:
Answered
For:
OpenStack Compute (nova) Edit question
Assignee:
No assignee Edit question
Last query:
2011-10-03
Last reply:
2011-10-10
Chris Godwin (patchshorts) said : #1

Seriously? No one knows the answer to my question? Surely I'm not the only one thinking availability of the cloud controller.

Vish Ishaya (vishvananda) said : #2

So "Cloud Controller" is a somewhat generic term. There are multiple services:

1) nova-api

You should be able to run multiple copies of this and put a load balancer in front.

2) nova-scheduler

You can run multiple copies opt this as needed. There is a small failure scenario where a crash in the middle could cause a single request to be lost

3) Mysql

Standard master-slave with drbd is fine here

4) Rabbit

Rabbit just added active/active mode:

http://www.rabbitmq.com/ha.html

On Oct 3, 2011, at 9:06 AM, Chris Godwin wrote:

> New question #173114 on OpenStack Compute (nova):
> https://answers.launchpad.net/nova/+question/173114
>
> I am in the middle of deciding weather or not to use openstack or openQRM and it all depends on how high availability is handled. So my question is as follows.
>
> Does the cloud controller always need to be available? If it does need to be available, how can I accomplish this via heartbeat and drbd? I know how to make a pair of servers in an ha/drbd cluster and to put mysql on top of that, is that, plus nova being installed, all that is required?
>
> If the Cloud controller doesn't need to be available always, what will my windows of repair time look like? Do I have until the next boot or reboot of a vm to fix a broken raid on a controller?
>
> I hope my questions are just irrelevant and I hope there is a way openstack handles this gracefully if there are enough nodes to replicate the data, if thats the case, how many nodes will I need?
>
> I'm trying to run just a general cloud to quick creation & deletion of customers vm plus the ability to host and ec2 style cloud with elastic capabilities.
>
> --
> You received this question notification because you are a member of Nova
> Core, which is an answer contact for OpenStack Compute (nova).

Yun Mao (yunmao) said : #3

How do multiple scheduler instances work?

My understanding is that the compute-api stub will call _ask_scheduler_to_create_instance, which cast the request to the scheduler topic. Assuming only one of the schedulers running picks it up, it will find the right node and cast the request to that node.

But a scheduler is supposed to have global knowledge of the resource utilization. If there are multiple of them running together, I think LeastCostScheduler is not longer a strict least cost but a rough one. There is the possibility that a compute-node has only room for 1 VM, but got concurrently scheduled and got two requests. Wouldn't that be a problem?

Can you help with this problem?

Provide an answer of your own, or ask Chris Godwin for more information if necessary.

To post a message you must log in.