How to config physics NVIDIA GPU instead of vGPU by using nvidia_gpu_driver

Asked by PEScn

I am installing cyborg using ansible-kolla (version is master) on Ubuntu 22.04.
I edited cyborg_agent /etc/cyborg/cyborg.conf, according to https://youtu.be/WUdkS9558p8?t=930.
But if only edit [agent]/enabled_drivers = nvidia_gpu_driver, there is an error:

2023-05-17 09:22:14.528 7 ERROR cyborg.accelerator.drivers.gpu.nvidia.sysinfo [-] Unable to load vGPU_type from [gpu_devices] Ensure "enabled_vgpu_types" is set.

When I edit [gpu_devices]/enabled_vgpu_types = nvidia-35, and add [vgpu_nvidia-35]/device_addresses = 0000:3b:00.0,0000:86:00.0 (3b:00.0 and 86:00.0 is Tesla P4 GPU address), the error is:
2023-05-17 09:59:03.660 7 ERROR oslo_service.periodic_task [-] Error during AgentManager.update_available_resource: FileNotFoundError: [Errno 2] No such file or directory: '/sys/bus/pci/devices/0000:3b:00.0/mdev_supported_types'

Additional Information:

ubuntu@os2:~$ lspci -nnk -s 3b:00.0
3b:00.0 3D controller [0302]: NVIDIA Corporation GP104GL [Tesla P4] [10de:1bb3] (rev a1)
        Subsystem: NVIDIA Corporation GP104GL [Tesla P4] [10de:11d8]
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau

IOMMU is enable.

Question information

Language:
English Edit question
Status:
Solved
For:
Cyborg (OpenStack) Edit question
Assignee:
No assignee Edit question
Solved by:
PEScn
Solved:
Last query:
Last reply:
Revision history for this message
Wenping Song (wenping1) said :
#1

hi,
if you use pGPU, you need config nothing except the enebaled_drivers=nvidia_gpu_driver, the error message 'ERROR cyborg.accelerator.drivers.gpu.nvidia.sysinfo [-] Unable to load vGPU_type from [gpu_devices] Ensure "enabled_vgpu_types" is set.' is mislead, it's should be warning log message. cannot you get the gpu device list through 'openstack accelerator device list' when you config the enabled_drivers?

Revision history for this message
PEScn (pescn) said :
#2

yes, the devices only contains fake_driver:
+--------------------------------------+------+--------+----------+------------------------------------------------+
| uuid | type | vendor | hostname | std_board_info |
+--------------------------------------+------+--------+----------+------------------------------------------------+
| c2034999-654e-4383-89d3-153b52e3b59d | FPGA | 0xABCD | os3 | {"device_id": "0xabcd", "class": "Fake class"} |
| f9f06119-f643-4c0f-a8aa-d4a30049c648 | FPGA | 0xABCD | os4 | {"device_id": "0xabcd", "class": "Fake class"} |
| b0d0893d-1c49-48d5-ab6a-978a87e18939 | FPGA | 0xABCD | os2 | {"device_id": "0xabcd", "class": "Fake class"} |
+--------------------------------------+------+--------+----------+------------------------------------------------+

but I read the code, and add following code to agent's config:

[agent]
enabled_drivers = nvidia_gpu_driver,fake_driver

[gpu_devices]
enabled_vgpu_types = nvidia-0

[vgpu_nvidia-0]
device_addresses = 0000:00:00.0

and it works!

+--------------------------------------+------+--------+----------+-------------------------------------------------------+
| uuid | type | vendor | hostname | std_board_info |
+--------------------------------------+------+--------+----------+-------------------------------------------------------+
| c2034999-654e-4383-89d3-153b52e3b59d | FPGA | 0xABCD | os3 | {"device_id": "0xabcd", "class": "Fake class"} |
| f9f06119-f643-4c0f-a8aa-d4a30049c648 | FPGA | 0xABCD | os4 | {"device_id": "0xabcd", "class": "Fake class"} |
| b0d0893d-1c49-48d5-ab6a-978a87e18939 | FPGA | 0xABCD | os2 | {"device_id": "0xabcd", "class": "Fake class"} |
| 047e1035-87c5-4da6-9419-b23c1c52333a | GPU | 10de | os2 | {"product_id": "1bb3", "controller": "3D controller"} |
| 1f3c8e5b-0db2-424c-badd-974b6c0998d8 | GPU | 10de | os3 | {"product_id": "1bb3", "controller": "3D controller"} |
| 81d5468c-768d-4fcc-bd69-012ae2010433 | GPU | 10de | os4 | {"product_id": "1bb3", "controller": "3D controller"} |
| 956da552-1e6b-4913-8922-b1d042f3abd1 | GPU | 10de | os3 | {"product_id": "1bb3", "controller": "3D controller"} |
| 362e005a-b160-4609-9a1f-fca3ac81cabb | GPU | 10de | os4 | {"product_id": "1bb3", "controller": "3D controller"} |
| 148f853d-7cc0-4b5b-9d50-016ea924c075 | GPU | 10de | os2 | {"product_id": "1bb3", "controller": "3D controller"} |
+--------------------------------------+------+--------+----------+-------------------------------------------------------+