What about "controller" migration ?

Asked by Mercadolibre CloudBuilders on 2011-08-11

Exactly.
Since i dont seem to find this anywere ( im really sorry if the answer existed and i couldnt find it ), supose a controller fails ( hardware failure ), so, we had on one of the compute nodes, the other necessary services, so ... what we did ?

#1 disabled network & scheduler services on the already dead controller using "nova-manage service disable"
#2 modified the nova.conf file ( all our compute nodes are using the same nova.conf using an NFS share ) to point to the "new" controller
#3 started the network & scheduler services on the "new" controller
#4 ran "nova-manage service list" to see that everything was just fine :

melicloud@compute10:~$ sudo nova-manage service list
compute10 nova-scheduler enabled :-) 2011-08-11 03:17:44
compute10 nova-network enabled :-) 2011-08-11 03:17:46
compute10 nova-compute enabled :-) 2011-08-11 03:17:37
compute11 nova-compute enabled :-) 2011-08-11 03:17:37
deadcontroller nova-scheduler disabled XXX 2011-07-21 21:53:11
deadcontroller nova-network disabled XXX 2011-07-21 21:53:00

restarted all libvirt-bin & nova-compute services on all nodes and what happened ?
when we tried to start a new instance, the compute node is selected by the new scheduler, but from the compute node side, we see that it keeps trying to connect to the "deadcontroller" nova-network service to get an available ip address for the instance, so the process fails.

The thing is, we didnt find anything on the nova database (mysql) that point us that we did something wrong.
So, anyone had the chance to migrate a failed controller or imagine by any chance what we are doing wrong ?

Best regards.

Question information

Language:
English Edit question
Status:
Solved
For:
OpenStack Compute (nova) Edit question
Assignee:
No assignee Edit question
Solved by:
Mercadolibre CloudBuilders
Solved:
2011-08-11
Last query:
2011-08-11
Last reply:
2011-08-11
Vish Ishaya (vishvananda) said : #1

You need to update the host reference in the network database to use the
hostname of the new controller. Another option is to use the same hostname
for the new controller.
On Aug 10, 2011 8:26 PM, "Mercadolibre CloudBuilders" <
<email address hidden>> wrote:
> New question #167613 on OpenStack Compute (nova):
> https://answers.launchpad.net/nova/+question/167613
>
> Exactly.
> Since i dont seem to find this anywere ( im really sorry if the answer
existed and i couldnt find it ), supose a controller fails ( hardware
failure ), so, we had on one of the compute nodes, the other necessary
services, so ... what we did ?
>
> #1 disabled network & scheduler services on the already dead controller
using "nova-manage service disable"
> #2 modified the nova.conf file ( all our compute nodes are using the same
nova.conf using an NFS share ) to point to the "new" controller
> #3 started the network & scheduler services on the "new" controller
> #4 ran "nova-manage service list" to see that everything was just fine :
>
> melicloud@compute10:~$ sudo nova-manage service list
> compute10 nova-scheduler enabled :-) 2011-08-11 03:17:44
> compute10 nova-network enabled :-) 2011-08-11 03:17:46
> compute10 nova-compute enabled :-) 2011-08-11 03:17:37
> compute11 nova-compute enabled :-) 2011-08-11 03:17:37
> deadcontroller nova-scheduler disabled XXX 2011-07-21 21:53:11
> deadcontroller nova-network disabled XXX 2011-07-21 21:53:00
>
> restarted all libvirt-bin & nova-compute services on all nodes and what
happened ?
> when we tried to start a new instance, the compute node is selected by the
new scheduler, but from the compute node side, we see that it keeps trying
to connect to the "deadcontroller" nova-network service to get an available
ip address for the instance, so the process fails.
>
> The thing is, we didnt find anything on the nova database (mysql) that
point us that we did something wrong.
> So, anyone had the chance to migrate a failed controller or imagine by any
chance what we are doing wrong ?
>
> Best regards.
>
> --
> You received this question notification because you are a member of Nova
> Core, which is an answer contact for OpenStack Compute (nova).

Hi Vish!

Since is not an option for us to change the hostname of the new controller, cause that machine WAS a compute and is running a lot of machines, i want to understand the solution regarding update the network database.

What do mean with "update the host reference in the network database to use the
hostname of the new controller" is a specific table inside the "nova" database ??? if yes, what table would that be ?

Is a way also, to know what fields in the database affects

Thanks vish !

Vish Ishaya (vishvananda) said : #3

I guess that was a little unclear. There are various host columns. One on
the networks table, one on the floating ops table one on the fixed ops table
and one on the instances table. You should check all of these for the old
hostname and change them to the new one.
On Aug 11, 2011 5:41 AM, "Mercadolibre CloudBuilders" <
<email address hidden>> wrote:
> Question #167613 on OpenStack Compute (nova) changed:
> https://answers.launchpad.net/nova/+question/167613
>
> Status: Answered => Open
>
> Mercadolibre CloudBuilders is still having a problem:
> Hi Vish!
>
> Since is not an option for us to change the hostname of the new
> controller, cause that machine WAS a compute and is running a lot of
> machines, i want to understand the solution regarding update the network
> database.
>
> What do mean with "update the host reference in the network database to
use the
> hostname of the new controller" is a specific table inside the "nova"
database ??? if yes, what table would that be ?
>
> Is a way also, to know what fields in the database affects
>
> Thanks vish !
>
> --
> You received this question notification because you are a member of Nova
> Core, which is an answer contact for OpenStack Compute (nova).

Vish Ishaya (vishvananda) said : #4

s/floating ops/floating_ips
s/fixed ops/fixed_ips

Thx vish !

We'll try that !