OCR usage and integration with Sikuli

Asked by akbar

I wanted to know if there a possibility of using other OCR engine with Sikuli and if possible how can i achieve this integration with Sikuli? Most of the use cases require some text to be read/acted upon dynamically so want to experiment other available OCRs if possible. I have read other options(clipboard,....) but still would like to have more robust text identification mechanism.

Question information

Language:
English Edit question
Status:
Solved
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Solved by:
RaiMan
Solved:
Last query:
Last reply:
Revision history for this message
RaiMan (raimund-hocke) said :
#1

--- integration of other OCR engine ...
... is only possible if you step down to the native code source level, since the interface to the used Tesseract 2 features is not isolated absolutely, so you would have to make changes and adaptions in some different areas. But the main functions towards the Java API are in the Java class TextRecognizer.

--- to just check the possibilities of other OCR engines ...
... use a combination of ImageMagick (to prepare a screenshot for optimal OCR) and give this image to the OCR system. This is a 2-step workflow, that can be done on command line.
From Sikuli just to test the possibilities it is the fastest way, to use the IDE with some Python scripting, where you could call the OCR-step using os.popen(). I have done this once with ImageMagick and Tesseract 3.

If you do it in Java, there might be the possibility, to use BufferedImages, so you would not have to use concrete files (which slows down the process somehow).

If these tests are somehow satisfying, then you could decide how to proceed.

Revision history for this message
akbar (mohammed-akbar-ali) said :
#2

Thanks RaiMan.

I am using the java library of sikuli, can you please give more details about TextRecognizer / ImageMagick/ BufferedImages usage?

Revision history for this message
akbar (mohammed-akbar-ali) said :
#3

Rephrasing the question.

Integration - Modifying source code is costly and complex.

Other OCR - I assume you meant that ImageMagick+Sikuli can be done in Java and BufferedImages can also be used? Can you please elaborate?

Revision history for this message
Best RaiMan (raimund-hocke) said :
#4

both ImageMagick (convert: work on images) and Tesseract (tesseract: get text from an image) have commands, that can be used on the command line and taking files as input and giving files as output.

So principally a shell script
convert input.png ... some actions output.tiff //* optimize for OCR
tesseract output.tiff ... additional options

would create a textfile in the end containing what Tesseract could read.

This shellscript can be run from a Java program, that produces input.png using Sikuli features and finally reads output.txt to get the textual result.

At least the first step (convert) can be done in memory using JMagick and BufferedImages (which you can get from Sikuli too).

Revision history for this message
akbar (mohammed-akbar-ali) said :
#5

Thanks RaiMan, that solved my question.