VMware duplicate serial numbers causes partitioning failures

Asked by OSAM1 System Admin

I'd like to create a software RAID 1 across two NVMe disks on a VMware guest. Storage config is below.
The problem is that for VMware, by default, the ID_SERIAL for each disk is dependent only on the controller ID. e.g. if the controller is ID 0, then the ID_SERIAL for the disk is "VMware Virtual NVMe Disk_VMware NVME_0000".

Therefore if both disks are on the same controller, they both have identical serial numbers. Partitioning fails because subiquity puts the duplicate serial numbers in the storage config, overriding the device path, and issuing the incorrect sgdisk command.

One workaround is to move the disk to another NVMe controller, so that the serial number is different ("... NVME_0001").

I haven't (yet) found a way in VMware to change the disk serial number to a user-specified string.

Perhaps subiquity could provide a way to ignore the disk serial number (to prefer device path instead)?

Input storage config (excerpt):

  storage:
    config:
# for 2 NVMe disks in RAID1
# Partition table of two disks
    - { ptable: gpt, path: /dev/nvme0n1, wipe: superblock-recursive, preserve: false, name: '', grub_device: false, type: disk, id: disk-nvme0n1 }
    - { ptable: gpt, path: /dev/nvme0n2, wipe: superblock-recursive, preserve: false, name: '', grub_device: false, type: disk, id: disk-nvme0n2 }
# Install GRUB on first disk
    - { device: disk-nvme0n1, size: 1G, wipe: superblock, flag: boot, number: 1, preserve: false, grub_device: true, type: partition, id: partition-3 }
    - { fstype: fat32, volume: partition-3, preserve: false, type: format, id: format-2 }
# Install GRUB on second disk
    - { device: disk-nvme0n2, size: 1G, wipe: superblock, flag: boot, number: 1, preserve: false, grub_device: true, type: partition, id: partition-8 }
    - { fstype: fat32, volume: partition-8, preserve: false, type: format, id: format-5 }
# create partitions on both disks for /boot
    - { device: disk-nvme0n1, size: 2G, wipe: superblock, flag: '', number: 2, preserve: false, grub_device: false, type: partition, id: partition-9 }
    - { device: disk-nvme0n2, size: 2G, wipe: superblock, flag: '', number: 2, preserve: false, grub_device: false, type: partition, id: partition-10 }
# create partitions on both disks for the LVM, filling the disk
    - { device: disk-nvme0n1, size: -1, wipe: superblock, flag: '', number: 3, preserve: false, grub_device: false, type: partition, id: partition-11 }
    - { device: disk-nvme0n2, size: -1, wipe: superblock, flag: '', number: 3, preserve: false, grub_device: false, type: partition, id: partition-12 }

The resulting subiquity-partitioning.conf (excerpt), with duplicate serial numbers is below.

storage:
  config:
  - ptable: gpt
    serial: VMware Virtual NVMe Disk_VMware NVME_0000
    path: /dev/nvme0n1
    wipe: superblock-recursive
    preserve: false
    name: ''
    grub_device: false
    id: disk-nvme0n1
    type: disk
  - ptable: gpt
    serial: VMware Virtual NVMe Disk_VMware NVME_0000
    path: /dev/nvme0n2
    wipe: superblock-recursive
    preserve: false
    name: ''
    grub_device: false
    id: disk-nvme0n2
    type: disk
  - device: disk-nvme0n1
    size: 1073741824
    wipe: superblock
    flag: boot
    number: 1
    preserve: false
    grub_device: true
    id: partition-3
    type: partition
  - fstype: fat32
    volume: partition-3
    preserve: false
    id: format-2
    type: format
  - device: disk-nvme0n2
    size: 1073741824
    wipe: superblock
    flag: boot
    number: 1
    preserve: false
    grub_device: true
    id: partition-8
    type: partition
  - fstype: fat32
    volume: partition-8
    preserve: false
    id: format-5
    type: format
[...]

The curtin log shows that it is attempting to partition nvme0n1 instead of nvme0n2, even though "partition-8" is on the second disk:

get_blockdev_sector_size: (log=512, phys=512)
nvme0n1 logical_block_size_bytes: 512
adding partition 'partition-8' to disk 'disk-nvme0n2' (ptable: 'gpt')
partnum: 1 offset_sectors: 2048 length_sectors: 2097151
Preparing partition location on disk /dev/nvme0n1
Wiping 1M on /dev/nvme0n1 at offset 1048576
Running command ['sgdisk', '--new', '1:2048:2099199', '--typecode=1:ef00', '/dev/nvme0n1'] with allowed return codes [0] (capture=True)
An error occured handling 'partition-8': ProcessExecutionError - Unexpected error while running command.
Command: ['sgdisk', '--new', '1:2048:2099199', '--typecode=1:ef00', '/dev/nvme0n1']

Question information

Language:
English Edit question
Status:
Solved
For:
Ubuntu subiquity Edit question
Assignee:
No assignee Edit question
Solved by:
actionparsnip
Solved:
Last query:
Last reply:
Revision history for this message
Best actionparsnip (andrew-woodhead666) said :
#1

I don't think VMWare support RAID like this in a VM. It's not needed either as the underlying storage provides the disk resilience. What are you going to do? Pull a disk from a VM to put a new one in.....? Doesn't make any sense.

If you are just exploring software RAID then you could remove the second disk, then add a second disk controller to the VM then add the 2nd disk for the VM there which will give it a different serial (If what you are saying is true). Should work

Revision history for this message
OSAM1 System Admin (osam1sysadmin) said :
#2

I agree that this configuration doesn't make sense for a production VM.

I was using VMware to emulate an actual physical system, to test my autoinstall configuration. On the physical system, the NVMe drives are on the same controller, so that's what I tried to emulate in the VM.

I thought maybe subiquity should handle this weird edge case, but on further reflection -- it's probably more "don't do that!".

Maybe someone else using VMware with NVMe controllers/disks will see this and it will help.

Revision history for this message
OSAM1 System Admin (osam1sysadmin) said :
#3

Thanks actionparsnip, that solved my question.