Failed to deploy machine with HP Smart Array Raid 6i

Bug #1562249 reported by Robin
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Landscape Server
Invalid
Undecided
Unassigned
MAAS
Invalid
Undecided
Unassigned
curtin
Fix Released
Undecided
Unassigned
curtin (Ubuntu)
Fix Released
Undecided
Unassigned
Trusty
Confirmed
Medium
Unassigned
Xenial
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

 * Attempting to deploy a machine with a HP Smart Array Raid 6i fails to
   install due to Curtin miscalculating the device path to CCISS
   partitions

   Curtin has been updated to properly calculate and handle the
   different forms of the HP CCISS array devices and partition path names
   in /dev and in /sys

[Test Case]

 * Install proposed curtin package and deploy Ubuntu images to a system
   with an HP Smart Array device.

   PASS: Ubuntu OS successfully installed

   FAIL: Ubuntu fails to install with error message like:

    An error occured handling 'cciss!c0d0':
    OSError - [Errno 2] No such file or directory: '/sys/block/c0d0/holders'

[Regression Potential]

 * The fix included changes to how curtin determines the sysfs path to
   block devices and partitions. It's possible that these changes could
   cause working systems which currently deploy to fail to deploy.
   However, this is mitigated by the fact that only HP Smart Array devices
   have special naming scheme for partitions so it's unlikely that non-HP
   storage devices are using the kernel path /dev/cciss as a prefix.

[Original Description]
Attempting to deploy a machine with a HP Smart Array Raid 6i fails. Installation output contains:

Error: Partition(s) 5 on /dev/cciss/c0d0 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
An error occured handling 'cciss!c0d0': OSError - [Errno 2] No such file or directory: '/sys/block/c0d0/holders'
[Errno 2] No such file or directory: '/sys/block/c0d0/holders'
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'block-meta', 'custom']
Exit code: 3
Reason: -
Stdout: "Error: Partition(s) 5 on /dev/cciss/c0d0 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.\nAn error occured handling 'cciss!c0d0': OSError - [Errno 2] No such file or directory: '/sys/block/c0d0/holders'\n[Errno 2] No such file or directory: '/sys/block/c0d0/holders'\n"

Related branches

Revision history for this message
Robin (robinrego) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I have two suggestions:
a) do a node deployment without juju, just maas.
b) do a juju node deployment

For (a), all you have to do is import a public ssh key into your MAAS user and then select a node and hit "deploy". It should install ubuntu on the node and you should be able to ssh in using that key and as the ubuntu user.

If (a) worked, then you can try (b). To do that, configure juju to use MAAS as a provider following the instructions here: https://jujucharms.com/docs/stable/config-maas

Then do a juju bootstrap. This is what the autopilot does.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

You can also check in the MAAS UI, specifically the node node-089caa3c-f27c-11e5-8e87-0014c2c1fead, to see if maas logged any errors about it. It should be near the bottom of the page in an option called "installation output" in the dropdown menu.

Changed in landscape:
status: New → Incomplete
Revision history for this message
Robin (robinrego) wrote :

I tried suggestion (a) and it works on the other nodes. I am able to deploy them and they show 'Deployed, in the maas UI.
When I try it with my problem node (HP server -DL 380 G4) with lan mac#0014c2c1fead it shows 'failed deployment' in the maas UI.

Even though the node shows failed deployment, I was able to ssh into the node from the maas server.

The errors logged in the maas UI - installation output for the 'failed deployment' node are pasted below.

Error: Partition(s) 5 on /dev/cciss/c0d0 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
An error occured handling 'cciss!c0d0': OSError - [Errno 2] No such file or directory: '/sys/block/c0d0/holders'
[Errno 2] No such file or directory: '/sys/block/c0d0/holders'
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'block-meta', 'custom']
Exit code: 3
Reason: -
Stdout: "Error: Partition(s) 5 on /dev/cciss/c0d0 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.\nAn error occured handling 'cciss!c0d0': OSError - [Errno 2] No such file or directory: '/sys/block/c0d0/holders'\n[Errno 2] No such file or directory: '/sys/block/c0d0/holders'\n"
Stderr: ''

Thanks

Revision history for this message
Robin (robinrego) wrote :

re: "Even though the node shows failed deployment, I was able to ssh into the node from the maas server". ... now Im, not sure about this.
I deleted the node re enlisted, commissioned and deployed. Same errors but this time I am not able to ssh into the node.

Pls see attached file which has output of /var/log$ cat cloud-init-output.log This was when I was able to ssh into node inspite of it saying failed deployment in maas UI.

Thanks

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

What are the disks or block devices attached to the problem node? Anything that makes it different from the others? Like an sd card? usb pendrive? PCIe SSD? Some removable storage? Can you get me the lshw output for it? It should be in the node page in MAAS, in yaml or xml format IIRC.

Revision history for this message
Robin (robinrego) wrote :

It has 4 Scsi 36.4 GB 15K rpm drives. Earlier I set them up as Raid 0 like the other machines. That didnt work out so I tried Raid 1 and used 2 drives for each array. This is the current set up.

I dont have any other drives attached to it.

I have attached the lshw for this node for your reference.

Thanks.

Revision history for this message
Robin (robinrego) wrote :

I did a fresh & clean install of ubuntu 14.04 LTS this time. Installed maas and deployed the remaing 4 nodes using maas UI. I had the same problem with that same node... and it gives up with the msg: Failed Deployment. see attahed file 'lshw -2' attached for your reference.

Thanks.

Revision history for this message
Robin (robinrego) wrote :

I am able to SSH into the node that shows 'failed deployment' Please let me know if there is any information you need from that machine that might help.

Here us the output of block-devices from maas UI:

HpRS1.maas 00-maas-07-block-devices.out

[
 {
  "BLOCK_SIZE": "4096",
  "NAME": "sda",
  "ID_PATH": "/dev/disk/by-id/wwn-0x3000000100000001",
  "PATH": "/dev/sda",
  "ROTA": "1",
  "RM": "0",
  "MODEL": "VIRTUAL-DISK",
  "RO": "1",
  "SERIAL": "3000000100000001",
  "SIZE": "1468006400"
 },
 {
  "BLOCK_SIZE": "4096",
  "NAME": "cciss!c0d0",
  "ID_PATH": "/dev/disk/by-id/wwn-0x600508b100184439535350395850003d",
  "PATH": "/dev/cciss/c0d0",
  "ROTA": "1",
  "RM": "0",
  "MODEL": "LOGICAL VOLUME",
  "RO": "0",
  "SERIAL": "600508b100184439535350395850003d",
  "SIZE": "36414750720"
 },
 {
  "BLOCK_SIZE": "4096",
  "NAME": "cciss!c0d1",
  "ID_PATH": "/dev/disk/by-id/wwn-0x600508b100184439535350395850003e",
  "PATH": "/dev/cciss/c0d1",
  "ROTA": "1",
  "RM": "0",
  "MODEL": "LOGICAL VOLUME",
  "RO": "0",
  "SERIAL": "600508b100184439535350395850003e",
  "SIZE": "36414750720"
 }
]

Here is output of: CAT/PROC/PARTITIONS form the failing machine

ubuntu@HpRS1:~$ cat /proc/partitions
major minor #blocks name

  11 0 1048575 sr0
 104 0 35561280 cciss/c0d0
 104 1 29334528 cciss/c0d0p1
 104 2 1 cciss/c0d0p2
 104 5 6223872 cciss/c0d0p5
 104 16 35561280 cciss/c0d1
   8 0 1433600 sda

Here is output of : FIND/SYS/BLOCK/ from the failing machine.

ubuntu@HpRS1:~$ find /sys/block/
/sys/block/
/sys/block/fd0
/sys/block/sda
/sys/block/sr0
/sys/block/ram0
/sys/block/ram1
/sys/block/ram2
/sys/block/ram3
/sys/block/ram4
/sys/block/ram5
/sys/block/ram6
/sys/block/ram7
/sys/block/ram8
/sys/block/ram9
/sys/block/loop0
/sys/block/loop1
/sys/block/loop2
/sys/block/loop3
/sys/block/loop4
/sys/block/loop5
/sys/block/loop6
/sys/block/loop7
/sys/block/ram10
/sys/block/ram11
/sys/block/ram12
/sys/block/ram13
/sys/block/ram14
/sys/block/ram15
/sys/block/cciss!c0d0
/sys/block/cciss!c0d1

and output of LSBLK from the failing node

ubuntu@HpRS1:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.4G 1 disk /media/root-ro
sr0 11:0 1 1024M 0 rom
cciss!c0d0 104:0 0 33.9G 0 disk
├─cciss!c0d0p1 104:1 0 28G 0 part
├─cciss!c0d0p2 104:2 0 1K 0 part
└─cciss!c0d0p5 104:5 0 6G 0 part
cciss!c0d1 104:16 0 33.9G 0 disk

In all the above .. there is an 'sda' virtual drive or something which maybe causing the c0d0 and c0d1 to not get detected.

If this is the case .. please suggest a workaround or solution.

information type: Proprietary → Public
Changed in landscape:
status: Incomplete → Invalid
summary: - node 'failed deployment' during openstack install
+ Failed to deploy machine with HP Smart Array Raid 6i
description: updated
tags: added: kanban-cross-team landscape
removed: cloud-install-failure
tags: removed: kanban-cross-team
Changed in maas:
status: New → Invalid
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Robin,

It seems you are using a custom partitioning layout, can you please attach the output of:

maas <maasuser> node get-curtin-config <system id>
maas <maasuser> node read <system id>

Also, please attach a full installation log (you can find it in the webUI at the bottom).

Changed in maas:
status: Invalid → Incomplete
Revision history for this message
Ryan Faircloth (ry3n) wrote :

I am having similar problems on a DL380G5 and DL360G5 using MAAS beta 3

Revision history for this message
Robin (robinrego) wrote :
Revision history for this message
Robin (robinrego) wrote :

Hi Andres

I have attached the files requested.

I do not need to use any custom partitioning and I am willing to make changes to the server configuration if necessary.

Thank you for your suggestions so far.

Robin (robinrego)
Changed in maas:
status: Incomplete → Confirmed
Ryan Harper (raharper)
tags: added: curtin-clear-holders
Ryan Harper (raharper)
tags: added: curtin-sru
Changed in curtin:
status: New → In Progress
Revision history for this message
Wesley Wiedenmeier (wesley-wiedenmeier) wrote :

It appears that some people have noticed the same issue and reported it in (LP: 1263181). There are several places in block_meta where curtin makes incorrect assumptions about the layout of /sys. I believe that in addition to clear_holders not being able to operate on ccis devices, bcache configuration will not work on them.

I am working on a fix right now, as I don't believe the fix I had been working on earlier is sufficient.

Revision history for this message
Wesley Wiedenmeier (wesley-wiedenmeier) wrote :

I have a fix in lp:~wesley-wiedenmeier/curtin/1562249 but it would be good to test it on server with a hpsa device.

There is a build of the fixed version here:
https://launchpad.net/~wesley-wiedenmeier/+archive/ubuntu/test2/+build/10181498

Revision history for this message
Robin (robinrego) wrote :

I tested the fix but the HP-DL380-G4 and HP-DL380-G5 servers still could not be deployed.

Since this is my first attempt and trying out a fix for a bug, I'd like to describe how I tested and would appreciate guidance if I did not do it right.

SSH into maas server and added ---
sudo add-apt-repository ppa:wesley-wiedenmeier/test2
sudo apt update
sudo apt upgrade
sudo apt dist-upgrade

Released hp server nodes and then tried to deploy. No success.

Then I tried doing the same on a fresh install of ubuntu server 14.04 LTS and Maas and the result was the same.. i.e all except the HP G4 & G5 in mysetup could not be deployed.

Please note that I am able to deploy these HP G5 servers if I add the line:

cciss.blacklist=yes modprobe.blacklist=cciss hpsa.hpsa_allow_any=1

to the Global Kernel Parameters (Boot parameters to pass to the kernel by default) section in the Settings for the Maas UI.

However.. that does not help with the DL380-G4 Machine.

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1562249] Re: Failed to deploy machine with HP Smart Array Raid 6i

On Thu, Jun 30, 2016 at 2:07 AM, Robin <email address hidden> wrote:

> I tested the fix but the HP-DL380-G4 and HP-DL380-G5 servers still could
> not be deployed.
>

Thanks for giving this a try.

>
> Since this is my first attempt and trying out a fix for a bug, I'd like
> to describe how I tested and would appreciate guidance if I did not do
> it right.
>
> SSH into maas server and added ---
> sudo add-apt-repository ppa:wesley-wiedenmeier/test2
> sudo apt update
> sudo apt upgrade
> sudo apt dist-upgrade
>
> Released hp server nodes and then tried to deploy. No success.
>

That looks correct, if you can confirm the curtin package version installed
with:

apt-cache policy python3-curtin

>
> Then I tried doing the same on a fresh install of ubuntu server 14.04 LTS
> and Maas and the result was the same.. i.e all except the HP G4 & G5 in
> mysetup could not be deployed.
>

Which maas version are you running?

Can you do the following and try again to collect some debugging logs?

curtin config:
maas <session> node get-curtin-config <system-id>

enable verbose debugging of curtin:
maas <session> maas set-config name=curtin_verbose value=true

In the node details page, it should display the curtin log.

>
>
> Please note that I am able to deploy these HP G5 servers if I add the
> line:
>
> cciss.blacklist=yes modprobe.blacklist=cciss hpsa.hpsa_allow_any=1
>
> to the Global Kernel Parameters (Boot parameters to pass to the kernel
> by default) section in the Settings for the Maas UI.
>
> However.. that does not help with the DL380-G4 Machine.
>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https://bugs.launchpad.net/bugs/1562249
>
> Title:
> Failed to deploy machine with HP Smart Array Raid 6i
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1562249/+subscriptions
>

Revision history for this message
Robin (robinrego) wrote :

CURTIN PKG VERSION
robin@IbmRS1:~$ apt-cache policy python3-curtin
python3-curtin:
  Installed: (none)
  Candidate: 0.1.0~bzr385-0ubuntu1
  Version table:
     0.1.0~bzr385-0ubuntu1 0
        500 http://ppa.launchpad.net/maas/stable/ubuntu/ trusty/main amd64 Packages
     0.1.0~bzr227-0ubuntu1~14.04.1 0
        500 http://ca.archive.ubuntu.com/ubuntu/ trusty-updates/universe amd64 Packages
     0.1.0~bzr126-0ubuntu1 0
        500 http://ca.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages

MAAS VERSION:
robin@IbmRS1:~$ apt-cache policy maas{,-dns,-dhcp} | grep Installed -B1 -A1
maas:
  Installed: 1.9.3+bzr4577-0ubuntu1~trusty1
  Candidate: 1.9.3+bzr4577-0ubuntu1~trusty1
--
maas-dns:
  Installed: 1.9.3+bzr4577-0ubuntu1~trusty1
  Candidate: 1.9.3+bzr4577-0ubuntu1~trusty1
--
maas-dhcp:
  Installed: 1.9.3+bzr4577-0ubuntu1~trusty1
  Candidate: 1.9.3+bzr4577-0ubuntu1~trusty1

Revision history for this message
Ryan Harper (raharper) wrote :

On Thu, Jun 30, 2016 at 11:48 AM, Robin <email address hidden> wrote:

> CURTIN PKG VERSION
> robin@IbmRS1:~$ apt-cache policy python3-curtin
> python3-curtin:
> Installed: (none)
> Candidate: 0.1.0~bzr385-0ubuntu1
> Version table:
> 0.1.0~bzr385-0ubuntu1 0
> 500 http://ppa.launchpad.net/maas/stable/ubuntu/ trusty/main
> amd64 Packages
> 0.1.0~bzr227-0ubuntu1~14.04.1 0
> 500 http://ca.archive.ubuntu.com/ubuntu/ trusty-updates/universe
> amd64 Packages
> 0.1.0~bzr126-0ubuntu1 0
> 500 http://ca.archive.ubuntu.com/ubuntu/ trusty/universe amd64
> Packages
>

It looks like you didn't get the PPA package installed, it supplies
python3-curtin ~bzr403

And that's because there's only a yakkety version. I'll work with Wesley
to get a
trusty version of the curtin package available in the PPA and let you know.

Revision history for this message
Wesley Wiedenmeier (wesley-wiedenmeier) wrote :

I just updated my ppa to include a package for trusty, xenial and yakkety, so it should be possible to test with the updated package now. Sorry for the inconvenience.

Revision history for this message
Ryan Harper (raharper) wrote :

On Thu, Jun 30, 2016 at 1:34 PM, Ryan Harper <email address hidden>
wrote:

>
>
> On Thu, Jun 30, 2016 at 11:48 AM, Robin <email address hidden> wrote:
>
>> CURTIN PKG VERSION
>> robin@IbmRS1:~$ apt-cache policy python3-curtin
>> python3-curtin:
>> Installed: (none)
>> Candidate: 0.1.0~bzr385-0ubuntu1
>> Version table:
>> 0.1.0~bzr385-0ubuntu1 0
>> 500 http://ppa.launchpad.net/maas/stable/ubuntu/ trusty/main
>> amd64 Packages
>> 0.1.0~bzr227-0ubuntu1~14.04.1 0
>> 500 http://ca.archive.ubuntu.com/ubuntu/ trusty-updates/universe
>> amd64 Packages
>> 0.1.0~bzr126-0ubuntu1 0
>> 500 http://ca.archive.ubuntu.com/ubuntu/ trusty/universe amd64
>> Packages
>>
>
> It looks like you didn't get the PPA package installed, it supplies
> python3-curtin ~bzr403
>
> And that's because there's only a yakkety version. I'll work with Wesley
> to get a
> trusty version of the curtin package available in the PPA and let you know.
>

OK, trusty version is present, so you can:

sudo add-apt-repository -y ppa:wesley-wiedenmeier/test2
sudo apt-get update
sudo apt-get install python3-curtin

In sudo apt-cache policy python3-curtin, you should see output that points
to the PPA:

python3-curtin:
  Installed: (none)
  Candidate: 0.1.0~bzr403-0ubuntu1
  Version table:
     0.1.0~bzr403-0ubuntu1 0
        500 http://ppa.launchpad.net/wesley-wiedenmeier/test2/ubuntu/
trusty/main amd64 Packages
     0.1.0~bzr227-0ubuntu1~14.04.1 0
        500 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe amd64
Packages
     0.1.0~bzr126-0ubuntu1 0
        500 http://archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages

Revision history for this message
Robin (robinrego) wrote :
Download full text (6.5 KiB)

Thanks for including the patch for trusty. Here are the new unsuccessful results:

robin@IbmRS1:~$ sudo apt-cache policy python3-curtin
[sudo] password for robin:
python3-curtin:
  Installed: 0.1.0~bzr403-0ubuntu1
  Candidate: 0.1.0~bzr403-0ubuntu1
  Version table:
 *** 0.1.0~bzr403-0ubuntu1 0
        500 http://ppa.launchpad.net/wesley-wiedenmeier/test2/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status
     0.1.0~bzr385-0ubuntu1 0
        500 http://ppa.launchpad.net/maas/stable/ubuntu/ trusty/main amd64 Packages
     0.1.0~bzr227-0ubuntu1~14.04.1 0
        500 http://ca.archive.ubuntu.com/ubuntu/ trusty-updates/universe amd64 Packages
     0.1.0~bzr126-0ubuntu1 0
        500 http://ca.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages

MAAS VERSION:
Installed: 1.9.3+bzr4577-0ubuntu1~trusty1

DL380-G4 Failed Deployment
MAchine Outut:

Error: Partition(s) 5 on /dev/cciss/c0d0 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
File descriptor 3 (socket:[13947]) leaked on lvremove invocation. Parent PID 10046: python
File descriptor 4 (/tmp/install.log) leaked on lvremove invocation. Parent PID 10046: python
File descriptor 5 (/tmp/install.log) leaked on lvremove invocation. Parent PID 10046: python
  Volume group "MaaS" not found
  Skipping volume group MaaS
  Volume group name has invalid characters
File descriptor 3 (socket:[13947]) leaked on vgremove invocation. Parent PID 10046: python
File descriptor 4 (/tmp/install.log) leaked on vgremove invocation. Parent PID 10046: python
File descriptor 5 (/tmp/install.log) leaked on vgremove invocation. Parent PID 10046: python
  Volume group "MaaS" not found
File descriptor 3 (socket:[13947]) leaked on lvremove invocation. Parent PID 10046: python
File descriptor 4 (/tmp/install.log) leaked on lvremove invocation. Parent PID 10046: python
File descriptor 5 (/tmp/install.log) leaked on lvremove invocation. Parent PID 10046: python
  Volume group "MaaS" not found
  Skipping volume group MaaS
  Volume group name has invalid characters
File descriptor 3 (socket:[13947]) leaked on vgremove invocation. Parent PID 10046: python
File descriptor 4 (/tmp/install.log) leaked on vgremove invocation. Parent PID 10046: python
File descriptor 5 (/tmp/install.log) leaked on vgremove invocation. Parent PID 10046: python
  Volume group "MaaS" not found
Error: Partition(s) 5 on /dev/cciss/c0d0 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
An error occured handling 'cciss!c0d0': ProcessExecutionError - Unexpected error while running command.
Command: ['parted', '/dev/cciss/c0d0', '--script', 'mklabel', 'msdos']
Exit code: 1
Reason: -
Stdout: ''
Stderr: ''
Unexpected error while running command.
Command: ['parted', '/dev/cciss/c0d0', '--script', 'mklabel', 'msdos']
Exit code: 1
Reason: -
Stdout: ''
Stderr: ''
Installation...

Read more...

Revision history for this message
Ryan Harper (raharper) wrote :
Download full text (7.9 KiB)

On Thu, Jun 30, 2016 at 4:00 PM, Robin <email address hidden> wrote:

> Thanks for including the patch for trusty. Here are the new
> unsuccessful results:

> robin@IbmRS1:~$ sudo apt-cache policy python3-curtin
> [sudo] password for robin:
> python3-curtin:
> Installed: 0.1.0~bzr403-0ubuntu1
> Candidate: 0.1.0~bzr403-0ubuntu1
> Version table:
> *** 0.1.0~bzr403-0ubuntu1 0
> 500 http://ppa.launchpad.net/wesley-wiedenmeier/test2/ubuntu/
> trusty/main amd64 Packages
> 100 /var/lib/dpkg/status
> 0.1.0~bzr385-0ubuntu1 0
> 500 http://ppa.launchpad.net/maas/stable/ubuntu/ trusty/main
> amd64 Packages
> 0.1.0~bzr227-0ubuntu1~14.04.1 0
> 500 http://ca.archive.ubuntu.com/ubuntu/ trusty-updates/universe
> amd64 Packages
> 0.1.0~bzr126-0ubuntu1 0
> 500 http://ca.archive.ubuntu.com/ubuntu/ trusty/universe amd64
> Packages
>
>
> MAAS VERSION:
> Installed: 1.9.3+bzr4577-0ubuntu1~trusty1
>
>
> DL380-G4 Failed Deployment
> MAchine Outut:
>
> Error: Partition(s) 5 on /dev/cciss/c0d0 have been written, but we have
> been unable to inform the kernel of the change, probably because it/they
> are in use. As a result, the old partition(s) will remain in use. You
> should reboot now before making further changes.
> File descriptor 3 (socket:[13947]) leaked on lvremove invocation. Parent
> PID 10046: python
> File descriptor 4 (/tmp/install.log) leaked on lvremove invocation. Parent
> PID 10046: python
> File descriptor 5 (/tmp/install.log) leaked on lvremove invocation. Parent
> PID 10046: python
> Volume group "MaaS" not found
> Skipping volume group MaaS
> Volume group name has invalid characters
> File descriptor 3 (socket:[13947]) leaked on vgremove invocation. Parent
> PID 10046: python
> File descriptor 4 (/tmp/install.log) leaked on vgremove invocation. Parent
> PID 10046: python
> File descriptor 5 (/tmp/install.log) leaked on vgremove invocation. Parent
> PID 10046: python
> Volume group "MaaS" not found
> File descriptor 3 (socket:[13947]) leaked on lvremove invocation. Parent
> PID 10046: python
> File descriptor 4 (/tmp/install.log) leaked on lvremove invocation. Parent
> PID 10046: python
> File descriptor 5 (/tmp/install.log) leaked on lvremove invocation. Parent
> PID 10046: python
> Volume group "MaaS" not found
> Skipping volume group MaaS
> Volume group name has invalid characters
> File descriptor 3 (socket:[13947]) leaked on vgremove invocation. Parent
> PID 10046: python
> File descriptor 4 (/tmp/install.log) leaked on vgremove invocation. Parent
> PID 10046: python
> File descriptor 5 (/tmp/install.log) leaked on vgremove invocation. Parent
> PID 10046: python
> Volume group "MaaS" not found
> Error: Partition(s) 5 on /dev/cciss/c0d0 have been written, but we have
> been unable to inform the kernel of the change, probably because it/they
> are in use. As a result, the old partition(s) will remain in use. You
> should reboot now before making further changes.
> An error occured handling 'cciss!c0d0': ProcessExecutionError - Unexpected
> error while running command.
> Command: ['parted', '/dev/cciss/c0d0', '--script', 'mklabel', 'msdos']
> Exit code: 1
> Reason:...

Read more...

Revision history for this message
Robin (robinrego) wrote :
Download full text (4.4 KiB)

I released one of the failing nodes (DL380-G4), ran: maas <session> maas set-config name=curtin_verbose value=true
Deployed the node and waited till it failed.
Then I ran: maas maaster node get-curtin-config node-1e6e4f34-3e88-11e6-8da5-001a640920e4
and this is the StdOut:
robin@IbmRS1:~$ maas maaster node get-curtin-config node-1e6e4f34-3e88-11e6-8da5-001a640920e4
Success.
Machine-readable output follows:
apt_mirrors:
  ubuntu_archive: http://archive.ubuntu.com//ubuntu
  ubuntu_security: http://archive.ubuntu.com//ubuntu
apt_proxy: http://192.168.1.150:8000/
debconf_selections:
  maas: 'cloud-init cloud-init/datasources multiselect MAAS

    cloud-init cloud-init/maas-metadata-url string http://192.168.1.150/MAAS/metadata/

    cloud-init cloud-init/maas-metadata-credentials string oauth_token_key=BW2fFMWap4qWJAvA4w&oauth_token_secret=3xfcj8MAtuSRNhJpntC5w86yww2LVzhb&oauth_consumer_key=vnC7UzTtzRU47BvUFj

    cloud-init cloud-init/local-cloud-config string apt_preserve_sources_list:
    true\napt_proxy: http://192.168.1.150:8000/\nmanage_etc_hosts: false\nmanual_cache_clean:
    true\nreporting:\n maas: {consumer_key: vnC7UzTtzRU47BvUFj, endpoint: ''http://192.168.1.150/MAAS/metadata/status/node-1e6e4f34-3e88-11e6-8da5-001a640920e4'',\n token_key:
    BW2fFMWap4qWJAvA4w, token_secret: 3xfcj8MAtuSRNhJpntC5w86yww2LVzhb,\n type:
    webhook}\nsystem_info:\n package_mirrors:\n - arches: [i386, amd64]\n failsafe:
    {primary: ''http://archive.ubuntu.com/ubuntu'', security: ''http://security.ubuntu.com/ubuntu''}\n search:\n primary:
    [''http://archive.ubuntu.com/ubuntu'']\n security: [''http://archive.ubuntu.com/ubuntu'']\n -
    arches: [default]\n failsafe: {primary: ''http://ports.ubuntu.com/ubuntu-ports'',
    security: ''http://ports.ubuntu.com/ubuntu-ports''}\n search:\n primary:
    [''http://ports.ubuntu.com/ubuntu-ports'']\n security: [''http://ports.ubuntu.com/ubuntu-ports'']\n

    '
install:
  log_file: /tmp/install.log
  post_files:
  - /tmp/install.log
kernel:
  mapping: {}
  package: linux-generic
late_commands:
  maas:
  - wget
  - --no-proxy
  - http://192.168.1.150/MAAS/metadata/latest/by-id/node-1e6e4f34-3e88-11e6-8da5-001a640920e4/
  - --post-data
  - op=netboot_off
  - -O
  - /dev/null
network:
  config:
  - id: eth0
    mac_address: 00:14:c2:c1:fe:ad
    mtu: 1500
    name: eth0
    subnets:
    - address: 10.1.1.152/24
      dns_nameservers: []
      gateway: 10.1.1.100
      type: static
    type: physical
  - id: eth1
    mac_address: 00:14:c2:c1:fe:ac
    mtu: 1500
    name: eth1
    subnets:
    - type: manual
    type: physical
  - address: 192.168.1.150
    search:
    - maas
    type: nameserver
  version: 1
network_commands:
  builtin:
  - curtin
  - net-meta
  - custom
partitioning_commands:
  builtin:
  - curtin
  - block-meta
  - custom
power_state:
  mode: reboot
reporting:
  maas:
    consumer_key: vnC7UzTtzRU47BvUFj
    endpoint: http://192.168.1.150/MAAS/metadata/status/node-1e6e4f34-3e88-11e6-8da5-001a640920e4
    token_key: BW2fFMWap4qWJAvA4w
    token_secret: 3xfcj8MAtuSRNhJpntC5w86yww2LVzhb
    type: webhook
showtrace: true
storage:
  config:
  -...

Read more...

Revision history for this message
Robin (robinrego) wrote :
Download full text (25.1 KiB)

I was not able to deploy the G4 or the G5 with wily or with Xenial images.

Machine Output G4:

start: cmd-install/stage-partitioning/builtin/cmd-block-meta: started: curtin command block-meta
start: cmd-install/stage-partitioning/builtin/cmd-block-meta: started: configuring disk: cciss!c0d0
get_path_to_storage_volume for volume cciss!c0d0
Processing serial 600508b1001844395353503958500047 via udev to 600508b1001844395353503958500047
devsync for /dev/cciss/c0d0
Running command ['partprobe', '/dev/cciss/c0d0'] with allowed return codes [0, 1] (shell=False, capture=False)
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
devsync happy - path /dev/cciss/c0d0 now exists
return volume path /dev/cciss/c0d0
Running command ['mdadm', '--assemble', '--scan'] with allowed return codes [0, 1, 2] (shell=False, capture=True)
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
clear_holders running on '/sys/class/block/cciss!c0d0/cciss!c0d0p1', with holders '[]'
wiping 1M on /dev/cciss/c0d0p1 at offsets [0, -1048576]
clear_holders running on '/sys/class/block/cciss!c0d0', with holders '[]'
wiping 1M on /dev/cciss/c0d0 at offsets [0, -1048576]
labeling device: '/dev/cciss/c0d0' with 'msdos' partition table
Running command ['parted', '/dev/cciss/c0d0', '--script', 'mklabel', 'msdos'] with allowed return codes [0] (shell=False, capture=False)
get_path_to_storage_volume for volume cciss!c0d0
Processing serial 600508b1001844395353503958500047 via udev to 600508b1001844395353503958500047
devsync for /dev/cciss/c0d0
Running command ['partprobe', '/dev/cciss/c0d0'] with allowed return codes [0, 1] (shell=False, capture=False)
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
devsync happy - path /dev/cciss/c0d0 now exists
return volume path /dev/cciss/c0d0
Running command ['blkid', '-o', 'export', '/dev/cciss/c0d0'] with allowed return codes [0, 2] (shell=False, capture=True)
Writing dname udev rule '['SUBSYSTEM=="block"', 'ACTION=="add|change"', 'ENV{DEVTYPE}=="disk"', 'ENV{ID_PART_TABLE_UUID}=="9c73f80b"', 'SYMLINK+="disk/by-dname/cciss!c0d0"']'
finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: SUCCESS: finished: configuring disk: cciss!c0d0
start: cmd-install/stage-partitioning/builtin/cmd-block-meta: started: configuring disk: cciss!c0d1
get_path_to_storage_volume for volume cciss!c0d1
Processing serial 600508b1001844395353503958500048 via udev to 600508b1001844395353503958500048
devsync for /dev/cciss/c0d1
Running command ['partprobe', '/dev/cciss/c0d1'] with allowed return codes [0, 1] (shell=False, capture=False)
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
devsync happy - path /dev/cciss/c0d1 now exists
return volume path /dev/cciss/c0d1
Running command ['mdadm', '--assemble', '--scan'] with allowed return codes [0, 1, 2] (shell=False, capture=True)
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
clear_holders running on '/sys/class/block/cciss!c0d1', with holders '[]'
wiping 1M on /dev/cciss/c0d1 at offsets...

Revision history for this message
DeeVee (deevee) wrote :

Getting the same issue with HP 400i Smart Array ctrl.

Revision history for this message
Wesley Wiedenmeier (wesley-wiedenmeier) wrote :

I have published a new curtin package in my ppa which should contain a complete fix.

So far one user has reported that they are able to complete an installation with curtin at revision 414, if any more users who have reported this want to try the package out its available at:
https://launchpad.net/~wesley-wiedenmeier/+archive/ubuntu/test2/+packages

I am going to try to add some verification into curtin's vmtests for cciss devices using fake paths, but I am not sure how well that is going to work so having verification from physical systems is really useful.

Revision history for this message
Gustaf Nilsson (gustafnilsson) wrote :

Thanks for the patch. The new revision works like a charm for me. Good job!

Revision history for this message
DeeVee (deevee) wrote :

Sorry, newbie here, could you pass along instructions to apply the fix, I will test as well.

Thanks!

Revision history for this message
DeeVee (deevee) wrote :

Got it figured out for install; now got the following:

An error occured handling 'cciss/c0d0': FileNotFoundError - [Errno 2] No such file or directory: '/tmp/tmp64mzihdf/scratch/rules.d/cciss/c0d0'
[Errno 2] No such file or directory: '/tmp/tmp64mzihdf/scratch/rules.d/cciss/c0d0'
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'block-meta', 'custom']
Exit code: 3
Reason: -
Stdout: b"An error occured handling 'cciss/c0d0': FileNotFoundError - [Errno 2] No such file or directory: '/tmp/tmp64mzihdf/scratch/rules.d/cciss/c0d0'\n[Errno 2] No such file or directory: '/tmp/tmp64mzihdf/scratch/rules.d/cciss/c0d0'\n"
Stderr: ''

Changed in maas:
status: Confirmed → Invalid
Revision history for this message
Robin (robinrego) wrote :

I tried the new curtin package

robin@IbmRS1:~$ sudo apt-cache policy python3-curtin
python3-curtin:
  Installed: 0.1.0~bzr414-0ubuntu1
  Candidate: 0.1.0~bzr414-0ubuntu1
  Version table:
 *** 0.1.0~bzr414-0ubuntu1 0
        500 http://ppa.launchpad.net/wesley-wiedenmeier/test2/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status
     0.1.0~bzr385-0ubuntu1 0
        500 http://ppa.launchpad.net/maas/stable/ubuntu/ trusty/main amd64 Packages
     0.1.0~bzr227-0ubuntu1~14.04.1 0
        500 http://ca.archive.ubuntu.com/ubuntu/ trusty-updates/universe amd64 Packages
     0.1.0~bzr126-0ubuntu1 0
        500 http://ca.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages
-----------------------------------------------------------------------------------

  robin@IbmRS1:~$ sudo apt-cache policy maas
maas:
  Installed: 1.9.3+bzr4577-0ubuntu1~trusty1
  Candidate: 1.9.3+bzr4577-0ubuntu1~trusty1
  Version table:
 *** 1.9.3+bzr4577-0ubuntu1~trusty1 0
        500 http://ppa.launchpad.net/maas/stable/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status
     1.7.6+bzr3376-0ubuntu3~14.04.1 0
        500 http://ca.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
     1.5.4+bzr2294-0ubuntu1.2 0
        500 http://security.ubuntu.com/ubuntu/ trusty-security/main amd64 Packages
     1.5+bzr2252-0ubuntu1 0
        500 http://ca.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

-----------------------------------------------------------------------------------

Initially it appeared to work as one HP DL380 G5 servers showed STDOUT similar to that when deployment is successful. However I decided to delete all nodes and try again with enlisting, commissioning and deploying. The result is that the three HP servers show failed deployment as a result.

I will retry with a fresh install and post results.

Revision history for this message
Wesley Wiedenmeier (wesley-wiedenmeier) wrote :

Hi DeeVee,

Could you please post the configuration that was given to curtin when you installed? You can ask maas for the configuration with:
maas <session> node get-curtin-config <system-id>

I believe that what happened in your case was that 'id' attribute for the disk at path '/dev/cciss/c0d0' was 'cciss/c0d0', which has a slash in it. Due to the way curtin generates dname rules, a slash in the id of a device with a name attribute would cause curtin to be unable to write a dname rule link file, as the OS would interpret the slash as a new directory instead of just part of the filename.

If this is the case, then we have to decide whether it would be better for curtin to replace a slash in the id attribute for storage config elements when generating dname rules or for maas not to emit storage config ids with special characters.

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Wesley,

"name: " should be used for dname not "id: ". "id: " should only be for curtin to reference other items in the yaml.

Revision history for this message
Wesley Wiedenmeier (wesley-wiedenmeier) wrote :

The 'name' attr is used for the actual target of the link in /dev/disk/by-dname/ but the configuration file itself has a filename based on 'id' at the moment. I have another branch where I have dname rules all generated in a single file during a later part of the install process. This may be a good time to update that branch and merge it into this one, so that the id is no longer used for the name of the rules file.

Revision history for this message
Wesley Wiedenmeier (wesley-wiedenmeier) wrote :

The curtin deb in my ppa has been updated to no longer base the filenames for the dname .rules files on the storage config element id. This should resolve the remaining issue.

If anyone would like to test, the package has been published in lp:wesley-wiedenmeier/test2 for yakkety, xenial and trusty.

Thanks

Revision history for this message
DeeVee (deevee) wrote :

Hi Wesley,

I got a failed deployment again;

Leaving 'diversion of /etc/init/ureadahead.conf to /etc/init/ureadahead.conf.disabled by cloud-init'
Setting up swapspace version 1, size = 8 GiB (8589930496 bytes)
no label, UUID=8bbfdae8-9228-47ec-b256-99807d12d2bb
[Errno 21] Is a directory: '/tmp/tmp0wfxjsqg/scratch/rules.d/cciss'
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'curthooks']
Exit code: 3
Reason: -

I will post the commands you requested earlier in a bit

Revision history for this message
Wesley Wiedenmeier (wesley-wiedenmeier) wrote :

Hi DeeVee,

Sorry about that, thanks for testing so many times.

With curtin version curtin_0.1.0~bzr415-0ubuntu1, the lastest in the repo, the name of the dname rule file is just based on the 'name' attribute of the disk, the 'id' is no longer used for the filename. I think what's happened for you is that the name of the disk contained a slash. From the error you posted I woul dguess that the 'name' attribute of your cciss device is 'cciss/c0d0' or something like that.

Because the name attribute is used as the target for a link in /dev/disk/by-dname it can't really have a slash in it.

Revision history for this message
Ryan Harper (raharper) wrote :

On Fri, Jul 8, 2016 at 5:53 PM, Wesley Wiedenmeier <
<email address hidden>> wrote:

> Hi DeeVee,
>
> Sorry about that, thanks for testing so many times.
>
> With curtin version curtin_0.1.0~bzr415-0ubuntu1, the lastest in the
> repo, the name of the dname rule file is just based on the 'name'
> attribute of the disk, the 'id' is no longer used for the filename. I
> think what's happened for you is that the name of the disk contained a
> slash. From the error you posted I woul dguess that the 'name' attribute
> of your cciss device is 'cciss/c0d0' or something like that.
>
> Because the name attribute is used as the target for a link in /dev/disk
> /by-dname it can't really have a slash in it.
>

I bet maas is generating those names from what it discovered in
commissioning.
We likely need to sanitize the name, much like with serial numbers, so
something
like replacing ['/', ' ',] with '_'

>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https://bugs.launchpad.net/bugs/1562249
>
> Title:
> Failed to deploy machine with HP Smart Array Raid 6i
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1562249/+subscriptions
>

Revision history for this message
Wesley Wiedenmeier (wesley-wiedenmeier) wrote :

I can sanitize the names in the config file, but we should probably also emit a warning back to the controller that the name's been changed so we don't surprise a user who may have been expecting a name with a special character.

Revision history for this message
DeeVee (deevee) wrote :

Thanks guys; I'll keep testing if it will keep hope alive.

This is a proof of concept for me / lab and I was given a bunch of DL360 G5's with P400i controllers, so it's a brick wall I'm hitting right now.

Only caveat this next week is that I'm off camping with my boys from Monday to Thursday, so my testing may be delayed.

But definitely; Thanks for you assistance!

I don't know if you still need the maas CLI output,even this is giving me issues.

Thanks again,

Darren

Revision history for this message
Robin (robinrego) wrote :

My clean install test continues to produced Failed Deployment unfortunately on HP DL380 G4 & G5 Servers.

maas:
  Installed: 1.9.3+bzr4577-0ubuntu1~trusty1
python3-curtin:
  Installed: 0.1.0~bzr415-0ubuntu1

See attached files for detailed info:
'Fresh Test Jul10' which has my notes and steps involved in testing.

Revision history for this message
Robin (robinrego) wrote :
Revision history for this message
Robin (robinrego) wrote :
Revision history for this message
Robin (robinrego) wrote :
Revision history for this message
Robin (robinrego) wrote :
Revision history for this message
Robin (robinrego) wrote :

Please not that inspite of M3-hpDL380G5-28Ghz showing 'curtin installation finished' the node still show failed deployment status.

See attached file for machine ouptut of deployed Ibm x3650 and failed HpDL380's

Revision history for this message
Robin (robinrego) wrote :

Finally success !!!
Using Xenial images for deployment allowed those HP nodes to Deploy Successfully.
Thank you.

Revision history for this message
Wesley Wiedenmeier (wesley-wiedenmeier) wrote :

There's a new package in the ppa that sanatizes dnames. It should be able to work in the case encountered earlier where installation failed due to dnames containing a slash.

The behavior of the sanatization is to replace any characters other than A-Z, a-z, 0-9, -, and _ with a dash and issue a warning that the dname had to be changed to ensure that the user is notified.

Revision history for this message
DeeVee (deevee) wrote :

It worked !!!!! Woohoo!

Thank you!

Leaving 'diversion of /etc/init/ureadahead.conf to /etc/init/ureadahead.conf.disabled by cloud-init'
Setting up swapspace version 1, size = 4 GiB (4294963200 bytes)
no label, UUID=6be45c8e-433b-41c3-a56d-fecee46d5003
Error: Partition(s) 1 on /dev/cciss/c0d0 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
Replacing config file /etc/default/grub with new version
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.4.0-28-generic
Found initrd image: /boot/initrd.img-4.4.0-28-generic
done
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.4.0-28-generic
Found initrd image: /boot/initrd.img-4.4.0-28-generic
done
Installing for i386-pc platform.
Installation finished. No error reported.
--2016-07-11 15:35:11-- http://192.168.x.x/MAAS/metadata/latest/by-id/4y3h7y/
Connecting to 192.168.x.x:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: '/dev/null'

     0K 67.1K=0s

2016-07-11 15:35:11 (67.1 KB/s) - '/dev/null' saved [2]

curtin: Installation finished.

Revision history for this message
Robin (robinrego) wrote :
Download full text (3.5 KiB)

Deploying those HP nodes with 6i and P400 controllers works when deploying from the maas UI.  However, I am not able to successfully complete an Openstack Autopilot installation with the fix in ppa 418.  The HP nodes coninue to get status of 'Failed deployment'

python3-curtin:
  Installed: 0.1.0~bzr418-0ubuntu1
I can provide logs but was wondering if there is something else that I should be aware of.. such as whether or not i would need to modify or wait till the fix is applied to any openstack files or repositorys.

      From: DeeVee <email address hidden>
 To: <email address hidden>
 Sent: Monday, July 11, 2016 11:42 AM
 Subject: [Bug 1562249] Re: Failed to deploy machine with HP Smart Array Raid 6i

It worked !!!!!  Woohoo!

Thank you!

Leaving 'diversion of /etc/init/ureadahead.conf to /etc/init/ureadahead.conf.disabled by cloud-init'
Setting up swapspace version 1, size = 4 GiB (4294963200 bytes)
no label, UUID=6be45c8e-433b-41c3-a56d-fecee46d5003
Error: Partition(s) 1 on /dev/cciss/c0d0 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.
Replacing config file /etc/default/grub with new version
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.4.0-28-generic
Found initrd image: /boot/initrd.img-4.4.0-28-generic
done
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.4.0-28-generic
Found initrd image: /boot/initrd.img-4.4.0-28-generic
done
Installing for i386-pc platform.
Installation finished. No error reported.
--2016-07-11 15:35:11--  http://192.168.x.x/MAAS/metadata/latest/by-id/4y3h7y/
Connecting to 192.168.x.x:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: '/dev/null'

    0K                                                        67.1K=0s

2016-07-11 15:35:11 (67.1 KB/s) - '/dev/null' saved [2]

curtin: Installation finished.

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1562249

Title:
  Failed to deploy machine with HP Smart Array Raid 6i

Status in curtin:
  In Progress
Status in Landscape Server:
  Invalid
Status in MAAS:
  Invalid

Bug description:
  Attempting to deploy a machine with a HP Smart Array Raid 6i fails.
  Installation output contains:

  Error: Partition(s) 5 on /dev/cciss/c0d0 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
  An error occured handling 'cciss!c0d0': OSError - [Errno 2] No such file or directory: '/sys/block/c0d0/holders'
  [Errno 2] No such file or directory: '/sys/block/c0d0/holders'
  Installation failed with exception: Unexpected error while running command.
  Command: ['curtin', 'block-meta', 'custom']
  Exit code: 3
  Reason: -
  Stdout: "Error: Partition(s) 5 on /dev/cciss/c0d0 have been written, but we have been unable to inform the kernel of the change, prob...

Read more...

Revision history for this message
Robin (robinrego) wrote :

How can I force Xenial images to load on nodes during Openstack Autopilot Install.
Currently or by default,  trusty images are being loaded.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

That's not configurable in the autopilot. You can upload xenial images (or any other) into the cloud yourself using standard openstack tools, like "glance" or the horizon GUI.

Revision history for this message
Robin (robinrego) wrote :

Im not quite sure if I have understood Andreas's answer to my previous question.

I am trying to carry out an openstack autopilot install on 6 machines.
M0 - IBM x3650 - Maas Server, Ubuntu 14.04 LTS
M1 - IBM x3650 - Node1
M2 - HP DL380 G5 - (P400 Raid controller) - Node 2
M3 - HP DL380 G5 - (P400 Raid controller) - Node 3
M4 - HP DL380 G4 - (6i Raid Controller) - Node 4
M5 - PC - node 5.

The reason for this Bug was that M2, M3 and M4 would fail deployment during the openstack cloud deployment.
This occurs after selecting the openstack components and then selecting hardware on which to deploy the cloud and then clicking on install.

I am using the ppa packages provided by Wesley and these allow successful manual deployment from the Maas UI only when I select 16.04 images for deployment (in maas UI settungs). However, during the autopilot install, landscape deploys the same machines to install the cloud but uses a 14.04 image and thus these HP nodes fail deployment.

My question is is there a way to make these HP machines request a wily or xenial image for successful cloud deployment.

Thanks.

Changed in curtin:
status: In Progress → Fix Committed
Ryan Harper (raharper)
description: updated
Revision history for this message
Andy Whitcroft (apw) wrote : Please test proposed package

Hello Robin, or anyone else affected,

Accepted curtin into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr425-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
Andy Whitcroft (apw)
Changed in curtin (Ubuntu):
status: New → Fix Released
Changed in curtin (Ubuntu Xenial):
status: New → Fix Committed
Revision history for this message
Jon Grimm (jgrimm) wrote :

Robin (robinrigo), or anyone that has commented thus far that this affects them:

Any chance you could verify the fix in trusty proposed works for you. See

https://bugs.launchpad.net/landscape/+bug/1562249/comments/58

Thanks!!

Revision history for this message
Jon Grimm (jgrimm) wrote :

Err. I meant xenial-proposed.

Scott Moser (smoser)
Changed in curtin (Ubuntu Trusty):
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Robin (robinrego) wrote :

Working on testing trusty proposed .. but unsure on how to enable trusty proposed on ubuntu server. I will read up an figure it out, but any help would be appreciated.

Thanks!!

Revision history for this message
Robin (robinrego) wrote :

Hi Jon and Scott.. I tried to test the fix but I am unable to do so. I am looking for someone to help me with this. Here is what I tried so far.

First I was hoping to test this fix for ubuntu Server 14.04.5 LTS. Even after a fresh install, I had no success. So I figured the fix here is only for Xenial... and so I started over.

After a fresh install of Xenial server 16.04 LTS. I did the following to install the -proposed pkg.

sudo nano /etc/apt/sources.list:

added line -- > deb http://archive.ubuntu.com/ubuntu/ xenial-proposed restricted main multiverse universe

then, sudo nano /etc/apt/preferences.d/proposed-updates

Added the following lines -->

Package: *
Pin: release a=xenial-proposed
Pin-Priority: 400

then sudo apt-get upgrade

then .. robin@MaaS:~$ sudo apt-cache policy python3-curtin
python3-curtin:
  Installed: 0.1.0~bzr399-0ubuntu1~16.04.1
  Candidate: 0.1.0~bzr399-0ubuntu1~16.04.1
  Version table:
     0.1.0~bzr425-0ubuntu1~16.04.1 400
        400 http://archive.ubuntu.com/ubuntu xenial-proposed/main amd64 Packages
        400 http://archive.ubuntu.com/ubuntu xenial-proposed/main i386 Packages
 *** 0.1.0~bzr399-0ubuntu1~16.04.1 500
        500 http://ca.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        500 http://ca.archive.ubuntu.com/ubuntu xenial-updates/main i386 Packages
        100 /var/lib/dpkg/status
     0.1.0~bzr365-0ubuntu1 500
        500 http://ca.archive.ubuntu.com/ubuntu xenial/main amd64 Packages
        500 http://ca.archive.ubuntu.com/ubuntu xenial/main i386 Packages

I am guessing I am not able to get the -proposed pkg installed ad I must be doing something wrong. I tried to enlist, coommission and deploy nodes with this .. and the result is that only the hp nodes fail deployment. see attached maas UI screenshot.

I am wondering if I need to install the file: curtin_0.1.0~bzr425.orig.tar.gz but I will need to learn how to how to do that over SSH connection or on the server directly.

Thanks.

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1562249] Re: Failed to deploy machine with HP Smart Array Raid 6i
Download full text (3.4 KiB)

On Sun, Oct 9, 2016 at 7:04 PM, Robin <email address hidden> wrote:

> Hi Jon and Scott.. I tried to test the fix but I am unable to do so. I
> am looking for someone to help me with this. Here is what I tried so
> far.
>
> First I was hoping to test this fix for ubuntu Server 14.04.5 LTS. Even
> after a fresh install, I had no success. So I figured the fix here is
> only for Xenial... and so I started over.
>

Thanks for giving this a try. Sorry for the confusion; you guessed
correctly; the update is to Xenial.

>
> After a fresh install of Xenial server 16.04 LTS. I did the following
> to install the -proposed pkg.
>
> sudo nano /etc/apt/sources.list:
>
> added line -- > deb http://archive.ubuntu.com/ubuntu/ xenial-proposed
> restricted main multiverse universe
>

All you should need to do on your MAAS host is:

 echo "deb http://archive.ubuntu.com/ubuntu/ xenial-proposed restricted
main multiverse universe" | sudo tee -a /etc/apt/sources.list
sudo apt update; sudo apt install curtin

> then, sudo nano /etc/apt/preferences.d/proposed-updates
>
> Added the following lines -->
>
> Package: *
> Pin: release a=xenial-proposed
> Pin-Priority: 400
>

The Pin config is not needed.

>
> then sudo apt-get upgrade
>

I typically specify the package, we don't need you to upgrade everything.

>
> then .. robin@MaaS:~$ sudo apt-cache policy python3-curtin
> python3-curtin:
> Installed: 0.1.0~bzr399-0ubuntu1~16.04.1
> Candidate: 0.1.0~bzr399-0ubuntu1~16.04.1
> Version table:
> 0.1.0~bzr425-0ubuntu1~16.04.1 400
> 400 http://archive.ubuntu.com/ubuntu xenial-proposed/main amd64
> Packages
> 400 http://archive.ubuntu.com/ubuntu xenial-proposed/main i386
> Packages
> *** 0.1.0~bzr399-0ubuntu1~16.04.1 500
> 500 http://ca.archive.ubuntu.com/ubuntu xenial-updates/main amd64
> Packages
> 500 http://ca.archive.ubuntu.com/ubuntu xenial-updates/main i386
> Packages
> 100 /var/lib/dpkg/status
> 0.1.0~bzr365-0ubuntu1 500
> 500 http://ca.archive.ubuntu.com/ubuntu xenial/main amd64 Packages
> 500 http://ca.archive.ubuntu.com/ubuntu xenial/main i386 Packages
>
> I am guessing I am not able to get the -proposed pkg installed ad I must
> be doing something wrong. I tried to enlist, coommission and deploy
> nodes with this .. and the result is that only the hp nodes fail
> deployment. see attached maas UI screenshot.
>

It appears that the updated curtin isn't installed; Try directly installing
with sudo apt install curtin as your policy shows it's available, but not
yet
installed.

>
> I am wondering if I need to install the file:
> curtin_0.1.0~bzr425.orig.tar.gz but I will need to learn how to how to
> do that over SSH connection or on the server directly.
>

The curtin package needs to be updated only on the MAAS host; MAAS handles
getting curtin over to the ephemeral environment.

>
> Thanks.
>
>
>
> ** Attachment added: "MaaS UI hp nodes fail deployment.PNG"
> https://bugs.launchpad.net/landscape/+bug/1562249/+
> attachment/4758425/+files/MaaS%20UI%20hp%20nodes%20fail%20deployment.PNG
>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching sub...

Read more...

Revision history for this message
Robin (robinrego) wrote :

I was able to install and test the updated curtin pkg "curtin 0.1.0~bzr425-0ubuntu1~16.04.1".

It resolves my problem, I am now able to have all the servers deployed.

Thank you Ryan and everyone else for the updated pkg and guidance in testing.

Robin (robinrego)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Robin (robinrego) wrote :

Please let me know if an updated package of this 'curtin 0.1.0~bzr425' would be built for for 14.04 trusty as well.

I would be happy to test it as well as test the Autopilot set up on this rig.

Thanks.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 0.1.0~bzr425-0ubuntu1~16.04.1

---------------
curtin (0.1.0~bzr425-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  [ Scott Moser ]
  * debian/new-upstream-snapshot: add writing of debian changelog entries.

  [ Ryan Harper ]
  * New upstream snapshot.
    - unittest,tox.ini: catch and fix issue with trusty-level mock of open
    - block/mdadm: add option to ignore mdadm_assemble errors (LP: #1618429)
    - curtin/doc: overhaul curtin documentation for readthedocs.org
      (LP: #1351085)
    - curtin.util: re-add support for RunInChroot (LP: #1617375)
    - curtin/net: overhaul of eni rendering to handle mixed ipv4/ipv6 configs
    - curtin.block: refactor clear_holders logic into block.clear_holders and
      cli cmd
    - curtin.apply_net should exit non-zero upon exception. (LP: #1615780)
    - apt: fix bug in disable_suites if sources.list line is blank.
    - vmtests: disable Wily in vmtests
    - Fix the unittests for test_apt_source.
    - get CURTIN_VMTEST_PARALLEL shown correctly in jenkins-runner output
    - fix vmtest check_file_strippedline to strip lines before comparing
    - fix whitespace damage in tests/vmtests/__init__.py
    - fix dpkg-reconfigure when debconf_selections was provided.
      (LP: #1609614)
    - fix apt tests on non-intel arch
    - Add apt features to curtin. (LP: #1574113)
    - vmtest: easier use of parallel and controlling timeouts
    - mkfs.vfat: add force flag for formating whole disks (LP: #1597923)
    - block.mkfs: fix sectorsize flag (LP: #1597522)
    - block_meta: cleanup use of sys_block_path and handle cciss knames
      (LP: #1562249)
    - block.get_blockdev_sector_size: handle _lsblock multi result return
      (LP: #1598310)
    - util: add target (chroot) support to subp, add target_path helper.
    - block_meta: fallback to parted if blkid does not produce output
      (LP: #1524031)
    - commands.block_wipe: correct default wipe mode to 'superblock'
    - tox.ini: run coverage normally rather than separately
    - move uefi boot knowledge from launch and vmtest to xkvm

 -- Ryan Harper <email address hidden> Mon, 03 Oct 2016 13:43:54 -0500

Changed in curtin (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote : Update Released

The verification of the Stable Release Update for curtin has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Scott Moser (smoser) wrote : Fixed in Curtin 17.1

This bug is believed to be fixed in curtin in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.