Not able to Solaris iSCSI volume to an instance

Asked by Nilanjan Roy on 2011-12-13

I have done a dual node setup where one node (cloud-controller) runs all the services nova-network,nova-volume,nova-compute,nova-api,nova-scheduler,nova-compute etc. and the other node (compute-node) runs only nova-compute. The architecture is same as in http://docs.openstack.org/diablo/openstack-compute/starter/content/Introduction-d1e390.html.

I am trying to use a ZFS + COMSTAR based iSCSI storage as a storage backend of nova-volume service. The /etc/nova.conf file of the controller node looks like

--dhcpbridge_flagfile=/etc/nova/nova.conf
--dhcpbridge=/usr/bin/nova-dhcpbridge
--logdir=/var/log/nova
--state_path=/var/lib/nova
--lock_path=/var/lock/nova
--state_path=/var/lib/nova
--verbose
--my_ip=10.10.10.2
--s3_host=10.10.10.2
--rabbit_host=10.10.10.2
--cc_host=10.10.10.2
--nova_url=http://10.10.10.2:8774/v1.1/
--fixed_range=192.168.0.0/16
--network_size=10
#--force_dhcp_release=True
--routing_source_ip=10.10.10.2
--sql_connection=postgresql://novadbadmin:novasecret@10.10.10.2/nova
--glance_api_servers=10.10.10.2:9292
--image_service=nova.image.glance.GlanceImageService
#--iscsi_ip_prefix=192.168.161.18
--vlan_interface=eth2
--vlan_start=3
#--vlan_interface=br100
--use_deprecated_auth
--public_interface=eth0

#VPN related falgs
#--vpn_image_id=39
#--use_project_ca
#--cnt_vpn_clients=5

#Volume related flags
--volume_manager=nova.volume.manager.VolumeManager
--volume_driver=nova.volume.san.SolarisISCSIDriver
--iscsi_ip_prefix=192.168.161.18 # This is the IP address of the Solaris iSCSI target server.
--san_ip=192.168.161.18
--san_login=nilanjan
--san_password=nilanjan123
--use_local_volumes=False
#--nouse_local_volumes
--poolname=nova
#Added two lines of code in nova/volume/san.py file to take the ZFS pool name on which the volumes will be created

I am able to create a volume using euca-create-volume command.
localadmin@clnt2:~$ euca-describe-volumes
VOLUME vol-0000000a 10 nova available (cto_kol, controller, None, None) 2011-12-13T07:20:13Z

In the solaris server(192.168.161.18) a zfs volume is created
zfs list -t volume
NAME USED AVAIL REFER MOUNTPOINT
nova/volume-0000000a 16K 1.12T 16K -
rpool/dump 2.00G 207G 2.00G -
rpool/swap 2.01G 207G 2.01G -

root@infraserver:~# itadm list-target -v
TARGET NAME STATE SESSIONS
iqn.2010-10.org.openstack:volume-0000000a online 0
        alias: -
        auth: none (defaults)
        targetchapuser: -
        targetchapsecret: unset
        tpg-tags: default

root@infraserver:~# sbdadm list-lu

Found 1 LU(s)

              GUID DATA SIZE SOURCE
-------------------------------- ------------------- ----------------
600144f03a08ca0000004ee6fb820006 10737418240 /dev/zvol/rdsk/nova/volume-0000000a

root@infraserver:~# stmfadm list-view -l 600144f03a08ca0000004ee6fb820006
View Entry: 0
    Host group : All
    Target group : tg-volume-0000000a
    LUN : 0

I have instances created both on the controller and compute hosts

localadmin@clnt2:~$ euca-describe-instances
RESERVATION r-8fj8sy7h cto_kol kol_cto
INSTANCE i-0000000c ami-00000014 10.10.10.224 192.168.4.13 running kol_cto (cto_kol, compute) 0 m1.small 2011-12-13T06:34:15Z nova aki-00000012 ari-00000013
RESERVATION r-6ufc2kd9 cto_kol kol_cto
INSTANCE i-0000000d ami-00000014 192.168.4.14 192.168.4.14 running kol_cto (cto_kol, controller) 0 m1.small 2011-12-13T06:34:50Z nova aki-00000012 ari-00000013

Now if I try to attach the volume to any instance using euca-attach-volume -i <instance id> -d <device> <volume>
command I am getting the following error in /var/log/nova/nova-compute.log
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2011-12-13 12:50:59,727 DEBUG nova.utils [-] Result was 255 from (pid=15957) execute /usr/lib/python2.7/dist-packages/nova/utils.py:183
2011-12-13 12:50:59,727 ERROR nova.rpc [-] Exception during message handling
(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 620, in _process_data
(nova.rpc): TRACE: rval = node_func(context=ctxt, **node_args)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 117, in decorated_function
(nova.rpc): TRACE: function(self, context, instance_id, *args, **kwargs)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1242, in attach_volume
(nova.rpc): TRACE: volume_id)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/volume/manager.py", line 245, in setup_compute_volume
(nova.rpc): TRACE: path = self.driver.discover_volume(context, volume_ref)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/volume/driver.py", line 517, in discover_volume
(nova.rpc): TRACE:
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/volume/driver.py", line 473, in _get_iscsi_properties
(nova.rpc): TRACE:
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/volume/driver.py", line 439, in _do_iscsi_discovery
(nova.rpc): TRACE: for target in out.splitlines():
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 191, in execute
(nova.rpc): TRACE: cmd=' '.join(cmd))
(nova.rpc): TRACE: ProcessExecutionError: Unexpected error while running command.
(nova.rpc): TRACE: Command: sudo iscsiadm -m discovery -t sendtargets -p controller
(nova.rpc): TRACE: Exit code: 255
(nova.rpc): TRACE: Stdout: ''
(nova.rpc): TRACE: Stderr: 'iscsiadm: Connection to Discovery Address 10.10.10.2 failed\niscsiadm: Login I/O error, failed to receive a PDU\niscsiadm: retrying discovery login to 10.10.10.2\niscsiadm: Connection to Discovery Address 10.10.10.2 failed\niscsiadm: Login I/O error, failed to receive a PDU\niscsiadm: retrying discovery login to 10.10.10.2\niscsiadm: Connection to Discovery Address 10.10.10.2 failed\niscsiadm: Login I/O error, failed to receive a PDU\niscsiadm: retrying discovery login to 10.10.10.2\niscsiadm: Connection to Discovery Address 10.10.10.2 failed\niscsiadm: Login I/O error, failed to receive a PDU\niscsiadm: retrying discovery login to 10.10.10.2\niscsiadm: Connection to Discovery Address 10.10.10.2 failed\niscsiadm: Login I/O error, failed to receive a PDU\niscsiadm: retrying discovery login to 10.10.10.2\niscsiadm: connection login retries (reopen_max) 5 exceeded\niscsiadm: Could not perform SendTargets discovery.\n'
(nova.rpc): TRACE:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The /etc/network/interfaces of the controller is

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
address 10.10.10.2
netmask 255.255.255.0
broadcast 10.10.10.255
#gateway 10.10.10.1
#dns-nameservers 10.10.10.3

# The primary network interface
auto eth1
iface eth1 inet dhcp

auto eth2
iface eth2 inet static
address 192.168.3.1
netmask 255.255.255.0
network 192.168.3.0
broadcast 192.168.3.255

auto eth3
iface eth3 inet static
address 192.168.161.65
netmask 255.255.255.0
network 192.168.161.0
broadcast 192.168.161.255

The /etc/network/interfaces file of the compute node is

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
address 10.10.10.3
netmask 255.255.255.0
broadcast 10.10.10.255
#gateway 10.10.10.2
#dns-nameservers 10.10.10.3

# The primary network interface
auto eth1
iface eth1 inet dhcp

auto eth2
iface eth2 inet static
address 192.168.3.2
netmask 255.255.255.0
network 192.168.3.0
#gateway 192.168.3.1
broadcast 192.168.3.255

auto eth3
iface eth3 inet static
address 192.168.161.26
netmask 255.255.255.0
network 192.168.161.0
broadcast 192.168.161.255

Now if i change manually (in nova database) the value of host to infraserver from controller in volumes table in nova database, I am able to attach the volume to an instance. However it creates problem while deleting the volume(detaching is working) using euca-delete-volume <volume id> command.

Am I missing some configuration parameter in /etc/nova/nova.conf file or some othe file? Any direction is highly appreciated.

Question information

Language:
English Edit question
Status:
Solved
For:
OpenStack Compute (nova) Edit question
Assignee:
No assignee Edit question
Solved by:
Nilanjan Roy
Solved:
2011-12-24
Last query:
2011-12-24
Last reply:
Nilanjan Roy (nilanjan-r) said : #1

Sorry for the heading...It should be "Not able to attach Solaris iSCSI volume to an instance" --a terrible typo mistake :(

Nilanjan Roy (nilanjan-r) said : #2

I have been debugging the issue:
Following things are noteworthy:
file nova/volume/driver.py

def _do_iscsi_discovery(self, volume):
        #TODO(justinsb): Deprecate discovery and use stored info
        #NOTE(justinsb): Discovery won't work with CHAP-secured targets (?)
        LOG.warn(_("ISCSI provider_location not stored, using discovery"))

        volume_name = volume['name']

        (out, _err) = self._execute('iscsiadm', '-m', 'discovery',
                                    '-t', 'sendtargets', '-p', volume['host'],
                                    run_as_root=True)

        for target in out.splitlines():
            if FLAGS.iscsi_ip_prefix in target and volume_name in target:
                return target
        return None
Here ISCSI provider_location is not stored in the database so discovery is used. In the comments TODO it is mentioned to use stored infor. I have checked the nova database and saw that the provider_location is not stored for the created volume in volumes table.
Also I have gone through the file nova/volume/san.py and following lines are of interest
~~~~~~~~~~~~~~~~~~~~~~~~~~
def create_volume(self, volume):
        """Creates a volume."""
        cliq_args = {}
        cliq_args['clusterName'] = FLAGS.san_clustername
        #TODO(justinsb): Should we default to inheriting thinProvision?
        cliq_args['thinProvision'] = '1' if FLAGS.san_thin_provision else '0'
        cliq_args['volumeName'] = volume['name']
        if int(volume['size']) == 0:
            cliq_args['size'] = '100MB'
        else:
            cliq_args['size'] = '%sGB' % volume['size']

        self._cliq_run_xml("createVolume", cliq_args)

        volume_info = self._cliq_get_volume_info(volume['name'])
        cluster_name = volume_info['volume.clusterName']
        iscsi_iqn = volume_info['volume.iscsiIqn']

        #TODO(justinsb): Is this always 1? Does it matter?
        cluster_interface = '1'

        cluster_vip = self._cliq_get_cluster_vip(cluster_name)
        iscsi_portal = cluster_vip + ":3260," + cluster_interface

        model_update = {}
        model_update['provider_location'] = ("%s %s" %
                                             (iscsi_portal,
                                              iscsi_iqn))

        return model_update
~~~~~~~~~~~~~~~~~~~~
def _do_export(self, volume, force_create):
        # Create a Logical Unit (LU) backed by the zfs volume
        zfs_poolname = self._build_zfs_poolname(volume)

        if force_create or not self._is_lu_created(volume):
            cmd = ("pfexec /usr/sbin/sbdadm create-lu /dev/zvol/rdsk/%s" %
                   (zfs_poolname))
            self._run_ssh(cmd)

        luid = self._get_luid(volume)
        iscsi_name = self._build_iscsi_target_name(volume)
        target_group_name = 'tg-%s' % volume['name']

        # Create a iSCSI target, mapped to just this volume
        if force_create or not self._target_group_exists(target_group_name):
            self._run_ssh("pfexec /usr/sbin/stmfadm create-tg %s" %
                          (target_group_name))

        # Yes, we add the initiatior before we create it!
        # Otherwise, it complains that the target is already active
        if force_create or not self._is_target_group_member(target_group_name,
                                                            iscsi_name):
            self._run_ssh("pfexec /usr/sbin/stmfadm add-tg-member -g %s %s" %
                          (target_group_name, iscsi_name))
        if force_create or not self._iscsi_target_exists(iscsi_name):
            self._run_ssh("pfexec /usr/sbin/itadm create-target -n %s" %
                          (iscsi_name))
        if force_create or not self._view_exists(luid):
            self._run_ssh("pfexec /usr/sbin/stmfadm add-view -t %s %s" %
                          (target_group_name, luid))

        #TODO(justinsb): Is this always 1? Does it matter?
        iscsi_portal_interface = '1'
        iscsi_portal = FLAGS.san_ip + ":3260," + iscsi_portal_interface

        db_update = {}
        db_update['provider_location'] = ("%s %s" %
                                          (iscsi_portal,
                                           iscsi_name))

        return db_update
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I am not sure whether the current implementation of diablo does not store the provider info in the database or it is a bug in the current release.

justinsb (justin-fathomdb) said : #3

I'm "justinsb" so am responsible for much of this :-)

Discovery is much less reliable than simply storing it in the database, but originally discovery was used. Ideally, the iSCSI target would be stored in the database and discovery would not be used.

So there are two issues:

1) The database doesn't have the location, so discovery is being used.
2) Discovery is failing.

I yesterday found a bug where the provider_location wasn't being updated (the return value from _do_export needs to be passed up the callchain). I haven't yet proposed the bugfix; I'm not sure whether this bug was present in Diablo.

Discovery is failing because it is trying to connect to the local machine, not the Solaris machine (iSCSI target), so (as you say) fixing the 'host' value in the database fixes that. That happens because 'host' is the volume host, not the san target. So, because #1 is broken, discovery won't work as-is.

I think fixing #1 is the best bet; I'll try to get my patch nicely packaged up. It is fairly simple though; create_export needs to return the value from _do_export:

    def create_export(self, context, volume):
        return self._do_export(volume, force_create=True)

Nilanjan Roy (nilanjan-r) said : #4

I have checked the file nova/volume/san.py and the class SolarisISCSIDriver does have the method create_export as suggested by you. But still the target information is not getting stored in the database.For the time being I am using a fix of my own. I have defined a flag "iscsi_portal" in nova/volume/driver.py file and changed do_iscsi_discovery method as follows:

 def _do_iscsi_discovery(self, volume):
        #TODO(justinsb): Deprecate discovery and use stored info
        #NOTE(justinsb): Discovery won't work with CHAP-secured targets (?)
        LOG.warn(_("ISCSI provider_location not stored, using discovery"))

        volume_name = volume['name']
        """
        (out, _err) = self._execute('iscsiadm', '-m', 'discovery',
                                    '-t', 'sendtargets', '-p', volume['host'],
                                    run_as_root=True)
        """
        ####################Added by NILANJAN#######################
        if FLAGS.iscsi_portal_ip:
            (out, _err) = self._execute('iscsiadm', '-m', 'discovery',
                                        '-t', 'sendtargets', '-p', FLAGS.iscsi_portal_ip,
                                        run_as_root=True)
        else:
            (out, _err) = self._execute('iscsiadm', '-m', 'discovery',
                                        '-t', 'sendtargets', '-p', volume['host'],
                                        run_as_root=True)
        ############################################################
        for target in out.splitlines():
            if FLAGS.iscsi_ip_prefix in target and volume_name in target:
                return target
        return None
But I think updating target information in the database should be the best fix. I am not sure if I am doing anything wrong. Please suggest.

Also in the file nova/volume/driver.py the class ISCSIDriver does not have any _do_export method.

Also in nova/volume/san.py there is no method for creating and deleting snapshots. Is there a plan for those things to be implemented?

Nilanjan Roy (nilanjan-r) said : #5

I am using postgresql and the datatype of "provider_location" filed of "volumes" table in "nova" database is "character varying(256)". I am not sure whether it is a bug of the database itself.

justinsb (justin-fathomdb) said : #6

The patch made it into trunk:
https://github.com/openstack/nova/commit/1314ee08a68e929d87ab5fdbf4cb8c4882bd5bb0

With the patch, newly created volumes should have provider_location set.

I use Postgres, so that's unlikely to be the problem.

As for ZFS snapshots, I'm sure someone will add it soon!

Nilanjan Roy (nilanjan-r) said : #7

Hi Justinsb,

I have changed san.py accordingly as provided by your patch and the provider location is updated in the database. Thanks a lot.

Is there similar solution available if I use local volumes (LVM bases volumes as for those the provider location is not stored in the database as well)?

Nilanjan Roy (nilanjan-r) said : #8

The problem is solved