Possible to hack orm.py to reduce import time?

Asked by Li Wee, Ong

Hi,

I am trying to shorten the time taken for csv import.
Looking at the source codes, I was thinking it might be possible to achieve this by
1. Removing savepoint model_load_save
2. Put for loop withn try/exception -> exit on 1st error vs keeping track of all the errors

The modified version of load function in orm.py is listed below.
Would appreciate any comments.

regards
Li Wee

    def load(self, cr, uid, fields, data, context=None):

        cr.execute('SAVEPOINT model_load')
        messages = []

        fields = map(fix_import_export_id_paths, fields)
        ModelData = self.pool['ir.model.data'].clear_caches()

        fg = self.fields_get(cr, uid, context=context)

        mode = 'init'
        current_module = ''
        noupdate = False

        ids = []
 try:
            for id, xid, record, info in self._convert_records(cr, uid,
                self._extract_records(cr, uid, fields, data,
                                      context=context, log=messages.append),
                context=context, log=messages.append):
  ids.append(ModelData._update(cr, uid, self._name,
                     current_module, record, mode=mode, xml_id=xid,
                     noupdate=noupdate, res_id=id, context=context))
 except psycopg2.Warning, e:
     logger.exception('Failed to import record %s', record)
            messages.append(dict(info, type='warning', message=str(e)))

 except psycopg2.Error, e:
            _logger.exception('Failed to import record %s', record)
            messages.append(dict(
                info, type='error',
                **PGERROR_TO_OE[e.pgcode](self, fg, info, e)))
            # Failed to write, log to messages, rollback savepoint (to
            # avoid broken transaction) and keep going

        if len(messages) > 0:
            cr.execute('ROLLBACK TO SAVEPOINT model_load')
            ids = False
 else:
     cr.execute('RELEASE SAVEPOINT model_load')
        return {'ids': ids, 'messages': messages}

Question information

Language:
English Edit question
Status:
Answered
For:
Odoo Server (MOVED TO GITHUB) Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Alexandre Fayolle - camptocamp (alexandre-fayolle-c2c) said :
#1

On jeu. 13 déc. 2012 06:11:10 CET, Li Wee, Ong wrote:
> New question #216671 on OpenERP Server:
> https://answers.launchpad.net/openobject-server/+question/216671
>
> Hi,
>
> I am trying to shorten the time taken for csv import.
> Looking at the source codes, I was thinking it might be possible to achieve this by
> 1. Removing savepoint model_load_save
> 2. Exit on 1st error vs keeping track of all the errors
>
> The modified version of load function in orm.py is listed below.
> Would appreciate any comments.
>
> regards
> Li Wee
>
>
>
>
> def load(self, cr, uid, fields, data, context=None):
>
> cr.execute('SAVEPOINT model_load')
> messages = []
>
> fields = map(fix_import_export_id_paths, fields)
> ModelData = self.pool['ir.model.data'].clear_caches()
>
> fg = self.fields_get(cr, uid, context=context)
>
> mode = 'init'
> current_module = ''
> noupdate = False
>
> ids = []
> for id, xid, record, info in self._convert_records(cr, uid,
> self._extract_records(cr, uid, fields, data,
> context=context, log=messages.append),
> context=context, log=messages.append):
> try:
> ids.append(ModelData._update(cr, uid, self._name,
> current_module, record, mode=mode, xml_id=xid,
> noupdate=noupdate, res_id=id, context=context))
> except psycopg2.Warning, e:
> _logger.exception('Failed to import record %s', record)
> messages.append(dict(info, type='warning', message=str(e)))
> break
> except psycopg2.Error, e:
> _logger.exception('Failed to import record %s', record)
> messages.append(dict(
> info, type='error',
> **PGERROR_TO_OE[e.pgcode](self, fg, info, e)))
> # Failed to write, log to messages, rollback savepoint (to
> # avoid broken transaction) and keep going
> break
> if any(message['type'] == 'error' or message['type'] == 'warning' for message in messages):
> cr.execute('ROLLBACK TO SAVEPOINT model_load')
> ids = False
> else:
> cr.execute('RELEASE SAVEPOINT model_load')
> return {'ids': ids, 'messages': messages}
>
>
>

Exiting on 1st error vs trying to move on, if done at all, must be an
option left to the user. Believe me, you don't want to submit your file
again and again and again to get the errors one by one.

--
Alexandre Fayolle
Chef de Projet
Tel : + 33 (0)4 79 26 57 94

Camptocamp France SAS
Savoie Technolac, BP 352
73377 Le Bourget du Lac Cedex
http://www.camptocamp.com

Revision history for this message
Li Wee, Ong (liwee-ong) said :
#2

Hi Alexandre,

Thanks for taking the time to look at my question.

I totally understand your point of view and in most instances, exiting on the 1st error is definitely not the ideal solution.

However, for the initial importing of data, we are looking at ~150k rows of products which takes 5-10 hours to process.
Since we are talking about the migration of data, most of the errors are systematic i.e. solving the 1st error usually solves the rest.
Waiting for openerp to report 50k worth of the same error is a serious waste of time, in this case 5-10 hrs worth.

Naturally, I plan to revert back to the old code once the system is officially deployed.
I am thinking to try this modification on my development system tomorrow but I was hoping to get some feedback before I start.

Thanks for your advice again.

regards
Li Wee

Revision history for this message
Kyle Waid (midwest) said :
#3

Hello,

It is no secret that the import/export operations of OpenERP are very slow. What I did in this case is write my own export module that uses direct sql queries. I bypass completely the ORM export tool. Not only is it very slow, it tries to do unnecessary calculations like stock, etc. The import operation itself is very atrocious. I have seen in many cases OpenERP cannot import its own export file. I found myself getting very frustrated with the constant errors with an (import compatible) export.

In my import tool, I use the ORM to map the special field types. m2o, o2m, etc. The result is being able to export 10k records in 10 or less seconds and import the same amount of records in 5 minutes. It is very dynamic, allowing entry of either database id or string values.

My favorite part is that it raise a true error instead of a stack trace. I will share it soon.

Revision history for this message
Li Wee, Ong (liwee-ong) said :
#4

Hi Kyle,

I am pretty new to openerp and was looking for a non-invasive method to improve processing time.
However, I would love to take a look at your custom solution when it is ready.
Appreciate your comments and thanks.

regards
Li Wee

Revision history for this message
Li Wee, Ong (liwee-ong) said :
#5

Made a few changes to the code and it is tested to be working fine
However there seems to be no improvement in processing time.

Original code
2012-12-14 01:07:06,384 17929 INFO None openerp.addons.base_import.models: importing 999 rows...
2012-12-14 01:10:11,992 17929 INFO None openerp.addons.base_import.models: done

Modified code
2012-12-14 00:59:00,079 17724 INFO None openerp.addons.base_import.models: importing 999 rows...
2012-12-14 01:02:08,980 17724 INFO None openerp.addons.base_import.models: done

Revision history for this message
Ive (weygers-ive) said :
#6

Hi Kyle,

I would also love to take a look at your custom solution for decreasing the importing/exporting time.
Thanks in advance.

regards
Li Wee

Revision history for this message
Lionel Sausin - Initiatives/Numérigraphe (ls-initiatives) said :
#7

There is no magic recipe that I know of.
If you use the import features or other means that use ORM, it's going
to be slow but foolproof and it will work eventually.
If you import directly into the database it can be made fast, but if
you're not completely absolutely certain about the data model you're
going to screw it.

Le 19/03/2015 11:01, Ive a écrit :
> Question #216671 on OpenERP Server changed:
> https://answers.launchpad.net/openobject-server/+question/216671
>
> Ive proposed the following answer:
> Hi Kyle,
>
> I would also love to take a look at your custom solution for decreasing the importing/exporting time.
> Thanks in advance.
>
> regards
> Li Wee
>

Revision history for this message
Ferdinand (office-chricar) said :
#8

On 2015-03-19 15:36, Lionel Sausin - Numérigraphe wrote:

I believe the problem comes from many2one import lines.
example - import invoices

  * first you will import the account_invoice data
  * then you will import account_invoice_line data
    for each line the all data stored in account_invoice will be
    recalculated
    hence the lines to read will be 1 + 2 + 3 + .. + n = n*(n+1)/2
    a 10 lines invoice causes 10*(10+1)/2 = 55 reads / calculations
    ( foolproof but slow )

a possible Solution, which I never tried is to prepare import data as
described here
https://doc.odoo.com/6.0/book/8/8_20_Config/8_20_Config_import_export/
look for "new partners"

> Question #216671 on OpenERP Server changed:
> https://answers.launchpad.net/openobject-server/+question/216671
>
> Lionel Sausin - Numérigraphe proposed the following answer:
> There is no magic recipe that I know of.
> If you use the import features or other means that use ORM, it's going
> to be slow but foolproof and it will work eventually.
> If you import directly into the database it can be made fast, but if
> you're not completely absolutely certain about the data model you're
> going to screw it.
>
>
> Le 19/03/2015 11:01, Ive a écrit :
>> Question #216671 on OpenERP Server changed:
>> https://answers.launchpad.net/openobject-server/+question/216671
>>
>> Ive proposed the following answer:
>> Hi Kyle,
>>
>> I would also love to take a look at your custom solution for decreasing the importing/exporting time.
>> Thanks in advance.
>>
>> regards
>> Li Wee
>>

Revision history for this message
Numérigraphe (numerigraphe) said :
#9

Ok thanks for making this clearer.

Indeed this form should let you import it all at once in a single
transaction which may be faster.
Then you may find it easier to just make a module with a YAML file,
which would have the same effect I suppose.

Another trick is for objects where there are parent-children
relationships: if you use the API you can defer the parent left/right
computation by passing a context key when writing (I can't remember
which one right now sorry).

Can you help with this problem?

Provide an answer of your own, or ask Li Wee, Ong for more information if necessary.

To post a message you must log in.