Improvement in OCR recognition with small images

Asked by Jose Damian

I've been doing some OCR text recognition tests with sikuli+tesseract, and the results have been quite poor. Probably the main problem is that the font in the images is very small (8 pixels high), with a total image height of 20 pixels.

After training tesseract with samples of this font, the images were correctly identified by tesseract alone; but when analyzed by sikuli, the results were much worse, so it seems that image treatment is different in sikuli and in tesseract executable, being the later better for small images.

I think there is a problem with the resize operation that sikuli VisionProxy does on images that have a heigh of less than 30 pixels. It is currently doing a nearest-neighbor interpolation (INTER_NEAREST), when opencv recomends for enlarging images a bicubic interpolation over 4x4 pixel neighborhood (INTER_CUBIC).

Changing this resize operation in tessocr.cpp, improved somewhat my results in sikuli
---------------
if (in_img.rows < MIN_HEIGHT){
   scale = ceil(MIN_HEIGHT / float(in_img.rows));
   resize(in_img, out_img, Size(in_img.cols*scale,in_img.rows*scale), 0, 0, INTER_CUBIC);
---------------

Anyway, I think it is an error to use interpolation to enlarge these small images. Given that it is needed an image height greater than 30 pixels, the best solution for me has been to enlarge the image adding pixels to the border:
---------------
if (in_img.rows < MIN_HEIGHT){
   scale = ceil(MIN_HEIGHT / float(in_img.rows));
   copyMakeBorder (in_img, out_img, 0, (scale-1)*in_img.rows, 0, (scale-1)*in_img.cols, BORDER_REPLICATE);
---------------

This solution achieves near perfect recognition with my image samples.
I don't know if this is the right treatment for all kinds of small images and could be included in a future release; but at least it is a change that people can try if they have problems recognizing small fonts.

Question information

Language:
English Edit question
Status:
Solved
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Solved by:
RaiMan
Solved:
Last query:
Last reply:
Revision history for this message
Eugene S (shragovich) said :
#1

Great input!

Revision history for this message
rob (reg82) said :
#2

i would like to experiment with this, but im not clear on how or where to implement these changes.

Revision history for this message
Jose Damian (josedamianflor) said :
#3

The change is in libVisionProxy 1.0.1. I downloaded it for linux from https://launchpad.net/sikuli/+download
The file to modify is src/tessocr.cpp, in the function preprocess_for_ocr around line 988

Of course, after the change you need to compile the library and put it in the libs directory of sikuli

Revision history for this message
Best RaiMan (raimund-hocke) said :
#4

@Jose
Great thanks for the input.

I will integrate it into the development of the new version 1.1.0 and test it the next days.

For Windows and Mac I can provide the revised native pack for download then along with publishing beta versions of 1.1.

For Linux the approach with the Sikuli-1.x.x-Supplemental-LinuxVisionProxy can be used, which contains the actual sources and a base script for building the lib.

Revision history for this message
Aravind (a4aravind) said :
#5

Is this applicable only for Linux ? Any such workarounds for Windows platform ?

Thanks !

Revision history for this message
Aravind (a4aravind) said :
#6

Oops.. I missed your Comment#4, Raiman.
Thanks

Revision history for this message
Jose Damian (josedamianflor) said :
#7

Thanks RaiMan, that solved my question.

Revision history for this message
fasatrix (fasatrix) said :
#8

Hi RaiMan,
Any news about when the revised native pack will be available for download (beta version 1.1)?

Thank you in advance
Cheers

Revision history for this message
RaiMan (raimund-hocke) said :
#9

not yet done. you have to wait.

Revision history for this message
Carlos Garcia (calgarcia) said :
#10

@ Jose Damian

Hello Jose, I've shared a image with you through sky drive. Please, can you help me with OCR for that image?

Thanks a lot,

Carlos