Startup problems with a compute node in a multi-node cluster
I am setting up a dual-node cluster, with one node running all services (let's call it nova1), and another just nova-compute (nova2). The first node works fine, but on the latter, the compute node nova2, nova-compute service does not start properly. In the syslog, I see lines like:
Jun 28 06:25:14 compute1 init: nova-compute main process (2914) terminated with status 1
Jun 28 06:25:14 compute1 init: nova-compute main process ended, respawning
(repeating every second)
In /var/log/
(OperationalError) (1054, "Unknown column 'instances.
Probably because of this error, Nova never sets up the networking bridges and routes, and the compute node cannot access guest instances running on the nova1 node.
Interestingly, the command line tools that I tried on nova2 ("euca-
Question information
- Language:
- English Edit question
- Status:
- Solved
- Assignee:
- No assignee Edit question
- Solved by:
- Brian Waldon
- Solved:
- Last query:
- Last reply:
Related FAQ:
None Link to a FAQ
Revision history for this message
|
#1 |
Information about my setup:
- each host has two NICs: one on the private management subnet (192.168.11.x), and another on the public internet
- FlatDHCPManager
- guest instances run on a virtual network 10.0.0.0/12, starting at 10.0.1.2
- nova1 is the network controller and has an address on the guest network: 10.0.1.1
Revision history for this message
|
#2 |
I've found that anytime you see "Unknown column" problems in your logs
you've got mismatched version problems.
Confirm that you're running the same version of Nova on both nodes.
dpkg -l '*nova*'
Everett
On Tue, Jun 28, 2011 at 3:41 PM, Davor Cubranic <
<email address hidden>> wrote:
> Question #163082 on OpenStack Compute (nova) changed:
> https:/
>
> Davor Cubranic gave more information on the question:
> Information about my setup:
>
> - each host has two NICs: one on the private management subnet
> (192.168.11.x), and another on the public internet
> - FlatDHCPManager
> - guest instances run on a virtual network 10.0.0.0/12, starting at
> 10.0.1.2
> - nova1 is the network controller and has an address on the guest network:
> 10.0.1.1
>
> --
> You received this question notification because you are an answer
> contact for OpenStack Compute (nova).
>
Revision history for this message
|
#3 |
You need to make sure the hosts are talking to the same database. Sounds like compute host is talking to a local (older) database.
Vish
On Jun 28, 2011, at 2:41 PM, Davor Cubranic wrote:
> Question #163082 on OpenStack Compute (nova) changed:
> https:/
>
> Davor Cubranic gave more information on the question:
> Information about my setup:
>
> - each host has two NICs: one on the private management subnet (192.168.11.x), and another on the public internet
> - FlatDHCPManager
> - guest instances run on a virtual network 10.0.0.0/12, starting at 10.0.1.2
> - nova1 is the network controller and has an address on the guest network: 10.0.1.1
>
> --
> You received this question notification because you are a member of Nova
> Core, which is an answer contact for OpenStack Compute (nova).
Revision history for this message
|
#4 |
Everett, there were updates on node1, while node2 was up to date. However, once I updated node1 and rebooted it, everything stopped working. I get numerous DB-related errors in various services' logs:
- nova-network.log: OperationalError) (1054, "Unknown column 'instances_
- nova-compute.log: (OperationalError) (1054, "Unknown column 'instances.
- nova-api.log: Unexpected error raised: 'NoneType' object does not support item assignment
- nova-manage.log: CRITICAL nova [-] enable() takes exactly 3 arguments (1 given)
Did some migration not run on the database after packages were upgraded? Is there a way to recover the database, or at least to reset it so that services can start running again?
Revision history for this message
|
#5 |
Vish, how can I tell what database a host is talking to? I do have "sql_connection" set properly in nova.conf.
Revision history for this message
|
#6 |
Yes.
nova-manage db sync
the packages do not attempt to auto-update the db in a multinode deployment. You will need to run it manually.
Vish
On Jun 28, 2011, at 3:11 PM, Davor Cubranic wrote:
> Question #163082 on OpenStack Compute (nova) changed:
> https:/
>
> Davor Cubranic posted a new comment:
> Everett, there were updates on node1, while node2 was up to date.
> However, once I updated node1 and rebooted it, everything stopped
> working. I get numerous DB-related errors in various services' logs:
>
> - nova-network.log: OperationalError) (1054, "Unknown column 'instances_
> - nova-compute.log: (OperationalError) (1054, "Unknown column 'instances.
> - nova-api.log: Unexpected error raised: 'NoneType' object does not support item assignment
> - nova-manage.log: CRITICAL nova [-] enable() takes exactly 3 arguments (1 given)
>
> Did some migration not run on the database after packages were upgraded?
> Is there a way to recover the database, or at least to reset it so that
> services can start running again?
>
> --
> You received this question notification because you are a member of Nova
> Core, which is an answer contact for OpenStack Compute (nova).
Revision history for this message
|
#7 |
Thanks Vish, running "nova-manage db sync" got me to the point where I can again run and access instances on node1. But I still see the following error in nova-api.log when I run "euca-describe-
Unexpected error raised: 'NoneType' object does not support item assignment
(nova.api): TRACE: Traceback (most recent call last):
(nova.api): TRACE: File "/usr/lib/
", line 320, in __call__
(nova.api): TRACE: result = api_request.
(nova.api): TRACE: File "/usr/lib/
py", line 78, in invoke
(nova.api): TRACE: result = method(context, **args)
(nova.api): TRACE: File "/usr/lib/
line 1097, in describe_images
(nova.api): TRACE: images = self.image_
(nova.api): TRACE: File "/usr/lib/
(nova.api): TRACE: return self.service.
(nova.api): TRACE: File "/usr/lib/
(nova.api): TRACE: limit=limit)
(nova.api): TRACE: File "/usr/lib/
(nova.api): TRACE: params = self._extract_
(nova.api): TRACE: File "/usr/lib/
(nova.api): TRACE: result[
(nova.api): TRACE: TypeError: 'NoneType' object does not support item assignment
(nova.api): TRACE:
Restarting nova-api service does not help, the error is still there.
Revision history for this message
|
#8 |
Also, still no Nova-related networking is set up on the second node (br100 and routes/iptables rules to ping the VMs).
Revision history for this message
|
#9 |
Davor,
"TypeError: 'NoneType' object does not support item assignment"
This is a bug that was recently fixed in Glance: https:/
Revision history for this message
|
#10 |
Thanks Brian.
It looks like there is a new set of updates today, but it didn't fix it yet. Do you know which release your fix will be in? I have python-glance 2011.3~
Revision history for this message
|
#11 |
Hey Davor, the bug fix went in to r148 of Glance so it seems you should have it (I think, I'm not an avid user of the PPAs). If you've restarted all applicable services and this is still happening feel free to submit a bug report or paste the latest error stack.
Revision history for this message
|
#12 |
I see it with 2011.3~
2011-06-29 16:03:25,080 DEBUG nova.api [-] action: DescribeImages from (pid=2422
7) __call__ /usr/lib/
2011-06-29 16:03:25,080 DEBUG nova.api [-] arg: Owner.1 val: self from (
pid=24227) __call__ /usr/lib/
2011-06-29 16:03:25,081 ERROR nova.api [35N3X4O8-
Unexpected error raised: Unable to connect to server. Got error: [Errno 111] EC
ONNREFUSED
(nova.api): TRACE: Traceback (most recent call last):
(nova.api): TRACE: File "/usr/lib/
", line 320, in __call__
(nova.api): TRACE: result = api_request.
(nova.api): TRACE: File "/usr/lib/
py", line 78, in invoke
(nova.api): TRACE: result = method(context, **args)
(nova.api): TRACE: File "/usr/lib/
(nova.api): TRACE: images = self.image_
(nova.api): TRACE: File "/usr/lib/
(nova.api): TRACE: return self.service.
(nova.api): TRACE: File "/usr/lib/
(nova.api): TRACE: limit=limit)
(nova.api): TRACE: File "/usr/lib/
(nova.api): TRACE: res = self.do_
(nova.api): TRACE: File "/usr/lib/
(nova.api): TRACE: headers, params)
(nova.api): TRACE: File "/usr/lib/
(nova.api): TRACE: "server. Got error: %s" % e)
(nova.api): TRACE: ClientConnectio
(nova.api): TRACE:
Revision history for this message
|
#13 |
Davor: It looks like your glance server is either not running, or your glance-api-servers flag in your nova config is incorrect. Check on those two things.
Revision history for this message
|
#14 |
I don't have a running glance server, and one is not configured in nova.conf. Is this a new requirement? I didn't seem to need it when I first installed OpenStack a few weeks ago (following the steps for manual install in docs.openstack.
Revision history for this message
|
#15 |
You may have installed Nova before Glance was required. We removed the LocalImageService recently in favor of the filesystem backend in Glance. Right now, your installation seems to be looking at the default localhost:9292, that's why you're seeing the connection errors. You'll need to install Glance and configure the 'glance_
Revision history for this message
|
#16 |
Really? You guys know that there is no mention of this in the official (?) docs on docs.openstack.org? And looking at the install script that you provide [1], there is no handling of glance at all in its code, so I assume that it also expects to use the LocalImageService.
[1] https:/
Revision history for this message
|
#17 |
Thank you for your help, Davor. I will make sure our image service documentation is up to date. Keep in mind the docs you refer to are for our latest official release, Cactus. If you are using the Diablo trunk or milestone packages, it isn't safe to refer to those docs.
I think the question here has been answered, please reopen it if you feel otherwise.
Revision history for this message
|
#18 |
Thanks for the explanation Brian. I think part of the problem is that the Cactus docs use "nova-trunk" PPA, so anyone following them now will run into problems because they won't know that Glance needs to be available. I'll add a comment to the page that references the trunk, but please let me know if it would also be useful to open a Launchpad bug for this.
On a second look, the automated install script might not have this problem because it uses the "release" ppa, so people using that will have no problems.
Revision history for this message
|
#19 |
Thanks Brian Waldon, that solved my question.
Revision history for this message
|
#20 |
I added this bug to the openstack-manuals project:
https:/
I think that is all we need for now. Please file any other discrepancies you find in the docs in that project as well. Thanks!
Revision history for this message
|
#21 |
I opened a separate bug about the use of trunk PPA in Cactus docs: https:/