Using Indexes instead of checking duplicating entities by hand.

Asked by Boris Pavlovic

It would be good to use DB Indexes instead of checking duplicating entities by hand in sqlalchemy DB API.

For example code in nova/db/sqlalchemy/api.py:

@require_admin_context
def instance_type_create(context, values):
    """Create a new instance type. In order to pass in extra specs,
    the values dict should contain a 'extra_specs' key/value pair:

    {'extra_specs' : {'k1': 'v1', 'k2': 'v2', ...}}

    """
    session = get_session()
    with session.begin():
        try:
            instance_type_get_by_name(context, values['name'], session)
            raise exception.InstanceTypeExists(name=values['name'])
        except exception.InstanceTypeNotFoundByName:
            pass
        try:
            instance_type_get_by_flavor_id(context, values['flavorid'],
                                           session)
            raise exception.InstanceTypeExists(name=values['name'])
        except exception.FlavorNotFound:
            pass
        try:
            specs = values.get('extra_specs')
            specs_refs = []
            if specs:
                for k, v in specs.iteritems():
                    specs_ref = models.InstanceTypeExtraSpecs()
                    specs_ref['key'] = k
                    specs_ref['value'] = v
                    specs_refs.append(specs_ref)
            values['extra_specs'] = specs_refs

            instance_type_ref = models.InstanceTypes()
            instance_type_ref.update(values)
            instance_type_ref.save(session=session)
        except Exception, e:
            raise exception.DBError(e)
        return _dict_with_extra_specs(instance_type_ref)

So before we can create new instance_type in DB we must check that instance type with the same name or flavor_id doesn't exist in DB. So we make 3 requests to DB instead of 1. I think it is a not good approach.

Also there are other entities that have create and update methods in API, so we should make such checks in both methods. So we can easily make mistake. For example I've found such code in sqlalchemy DB API:

@require_admin_context
def sm_backend_conf_create(context, values):
    session = get_session()
    with session.begin():
        config_params = values['config_params']
        backend_conf = model_query(context, models.SMBackendConf,
                                   session=session,
                                   read_deleted="yes").\
                                   filter_by(config_params=config_params).\
                                   first()

        if backend_conf:
            raise exception.Duplicate(_('Backend exists'))
        else:
            backend_conf = models.SMBackendConf()
            backend_conf.update(values)
            backend_conf.save(session=session)
    return backend_conf

@require_admin_context
def sm_backend_conf_update(context, sm_backend_id, values):
    session = get_session()
    with session.begin():
        backend_conf = model_query(context, models.SMBackendConf,
                                   session=session,
                                   read_deleted="yes").\
                           filter_by(id=sm_backend_id).\
                           first()

        if not backend_conf:
            raise exception.NotFound(
                _("No backend config with id %(sm_backend_id)s") % locals())

        backend_conf.update(values)
        backend_conf.save(session=session)
    return backend_conf

In function sm_backend_conf_update: there is no code that checks that there will be no duplicating entities in DB after update. This an obvious mistake.

It is not so easy to use indexes, because the entities are not physically deleted from DB (they are only marked as "deleted").

The best way that I've found is to use composite unique indexes like this one: (column1, column2, ...,columnN, deleted_at).
 So if an entity is `deleted` its `deleted_at` column is set to some date , so we can create new entity with same values of columns (column1, column2,...,columnN). Since `deleted_at` column of the new entity will be NULL by default, our unique constraint is satisfied. If there is already not `deleted` entity, we cannot create one more with the same parameters, because `deleted_at` in both entities is NULL.

Unfortunately it doesn't work, because in mysql NULL is not equal to NULL.
To fix this issue we can set default value of `deleted_at` column to, for example, '1970-1-1'. This will not break existing code, because `deleted_at` column isn't used to check the deletion status of an entity . ('deleted' column is used instead)

So I propose a better version of instance_type_create function:

First of all we should add UniqueConstraint for name and flavorid in existing instance_types table.

  instance_types = Table('instance_types', meta,
        Column('created_at', DateTime),
        Column('updated_at', DateTime),
        Column('deleted_at', DateTime, default=datetime.date(1970, 1, 1), nullable=False),
        Column('deleted', Boolean),
        Column('name', String(length=255)),
        Column('id', Integer, primary_key=True, nullable=False),
        Column('memory_mb', Integer, nullable=False),
        Column('vcpus', Integer, nullable=False),
        Column('swap', Integer, nullable=False),
        Column('vcpu_weight', Integer),
        Column('flavorid', String(length=255)),
        Column('rxtx_factor', Float),
        Column('root_gb', Integer),
        Column('ephemeral_gb', Integer),
        mysql_engine='InnoDB',
        UniqueConstraint('name', 'deleted_at', name= 'name_index'),
        UniqueConstraint('flavorid', 'deleted_at', name= 'flavorid_index')
    )

And then our function will look like this:

@require_admin_context
def instance_type_create(context, values):
   session = get_session()
    with session.begin():
        try:
            specs = values.get('extra_specs')
            specs_refs = []
            if specs:
                for k, v in specs.iteritems():
                    specs_ref = models.InstanceTypeExtraSpecs()
                    specs_ref['key'] = k
                    specs_ref['value'] = v
                    specs_refs.append(specs_ref)
            values['extra_specs'] = specs_refs

            instance_type_ref = models.InstanceTypes()
            instance_type_ref.update(values)
            instance_type_ref.save(session=session)
       except exception.Duplicate, e:
             raise exception.InstanceTypeExists(name=values['name'])
        except Exception, e:
            raise exception.DBError(e)
        return _dict_with_extra_specs(instance_type_ref)

Question information

Language:
English Edit question
Status:
Answered
For:
OpenStack Compute (nova) Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) said :
#1

Boris, please help us by submitting changes, here's the usual process for contributions - http://wiki.openstack.org/GerritWorkflow

Revision history for this message
Boris Pavlovic (boris-42) said :
#2

Thanks for reply.

I'm newbie in OpenStack and I'm wondering why indexes were not used in the case I described (for checking duplicates).
Maybe there are some reasons for that?

I will submit an example soon.

Revision history for this message
Boris Pavlovic (boris-42) said :
#3

I've submitted an example to review https://review.openstack.org/#/c/16163/

Can you help with this problem?

Provide an answer of your own, or ask Boris Pavlovic for more information if necessary.

To post a message you must log in.