What tesseract related improvements are expected with Sikuli 2?

Asked by Eugene S on 2017-03-21

Hi RaiMan,

The question about tesseract performance have been asked here many times and in one of your answers you mentioned that some improvements are expected in Sikuli 2.

Now as far as I understand, the issues that most people experience with Sikuli + tesseract (unreliable results...) are just a result of how tesseract itself works. So the poor results that tesseract based detection provides have nothing to do with Sikuli. And if that's the case I would be really interested to know what improvements you are planning to introduce?

I would think that the only way to go is to configure and train tesseract itself.


Question information

English Edit question
Sikuli Edit question
No assignee Edit question
Solved by:
Last query:
Last reply:
RaiMan (raimund-hocke) said : #1

Since the Tesseract implementation in Sikuli X1.0RC3 (2012 ;-) nothing has been changed internally.
Everything is still based on Tesseract 2 features, though currently Tesseract 3 is used.

What has to be done (besides using Tess4J):
- optimize the internal workflow to prepare the image for OCR/text-find
- allow the usage of Tesseract options/parameters and try to evaluate the optimal options setup beforehand
- support language selection and setup
- support training

I did not yet dive deeper into this area, so this must be rather vague for now.

Anyways I will touch this area only later this year.

Eugene S (shragovich) said : #2

Thanks for quick reply!

So basically what you are planning to do is just delegate the Tesseract internals to Sikuli user? Am I right? Meaning that the user can have some more access to Tesseract configuration through Sikuli.


Best RaiMan (raimund-hocke) said : #3

-- is just delegate the Tesseract internals to Sikuli user?
yes, but this should only be needed for special cases.
The normal approach should be (like now), that you say

someText = someRegion.text()

someText = someRegion.text(parameters)

where parameters are either a list of options or a pointer to a an optionset

options that are not specified are either preset internally or evaluated on the fly based on the given image (e.g. character/line/box mode, layout evaluation, ...)

... and all this based on an image, that is optimized for OCR before giving it to Tesseract.

Eugene S (shragovich) said : #4

Thanks RaiMan, that solved my question.