queXF

ICR vs OCR

Asked by Andy on 2013-12-16

Hi,

I'm using QueXF 1.13.5 and I'm getting fairly poor ICR success rates. Success is poor even when I re-process the same documents from which I previously manually trained ICR. I would expect a high success rate in that scenario.

From reading other posts, I understand that the ICR process is under review and bugfixes are allocated to release 1.14.0 scheduled for March 2014.

To work around this, is there a separate, more robust OCR feature which is able to read machine typed field content? In other words, could I pre-fill the fields, print the forms and read that fielded data reliably using QueXF even while the ICR problems exist?

Thanks,
Andy

Question information

Language:: English Edit question

Status:: Solved

For:: queXF Edit question

Assignee:: No assignee Edit question

Solved by:: Andy

Solved:: 2013-12-19

Last query:: 2013-12-19

Last reply:: 2013-12-17

Link existing bug

Revision history for this message

Adam Zammit (adamzammit) said on 2013-12-17:

Hi Andy,

For machine typed content - I'd suggest either:

a. If the data is able to be converted to a barcode (codabar or i25), convert it and print this instead of numbers/text, and use queXF to read the barcode field. This is highly accurate.

b. Look at the code for older versions of queXF (1.11.2 and earlier) - and see how it can do "OCR" by exporting an image and calling an external program like tesseract to return the result.

Adam

Revision history for this message

Andy (andyb0070) said on 2013-12-19:

HI Adam,

I'm doing what you suggested and attempting to shell out to Tessaract to do the OCR.

There is a deprecated function in functions.ocr.php which I have re-instated but on Windows 7, the following exec fails:

exec(CONVERT_BIN . " $tmpfname.wbmp -compress none -monochrome $tmpfname.tif");

I've checked the config and called them in a windows command window successfully. I'm not seeing any errors from php.

Do you know what the problem might be? Is exec problematic on Windows 7?

Thanks,
Andreas.

Revision history for this message

Andy (andyb0070) said on 2013-12-19:

I've resolved my issue.

On Windows, the config settings have to include the full path, enclosed in double quote. This works:

//Old OCR Stuff
if (!defined('CONVERT_BIN')) define('CONVERT_BIN', '"C:\\Program Files\\ImageMagick-6.8.7-Q16\\convert.exe"');
if (!defined('TESSERACT_BIN')) define('TESSERACT_BIN', '"C:\\Program Files\\Tesseract-OCR\\tesseract.exe"');

Futhermore, in functions.ocr.php - tesseractor, the exec statements look like this. The tesseract call requires a pagesegmode parameter 10 to treat the image as a single character. Like this:

//call ImageMagick
exec(CONVERT_BIN . " $tmpfname.wbmp -compress none -monochrome $tmpfname.tif");

//call tesseract
exec(TESSERACT_BIN . " $tmpfname.tif $tmpfname -psm 10");

To post a message you must log in.

Ask a question

Edit question

queXF

ICR vs OCR

Question information

Related bugs

Related FAQ:

Subscribers