SikuliX

Ocr settings to get correct text from a region

Asked by Vin Uppinkudru on 2018-03-07

Hi Raiman,
Apologies, if this question is a sort of repeat.
Stimuli version 1.1.1
Language - Java
I am using get text on a region and it's working ok most of the times.but some times it's giving me correct text.
For example
Number 1 is read as "'I".
Looks like some non English character.
How can Ser the language to only English.
And text recognition to only Alphabets (A-Z, a-z) and Numbers(0..9).

I am hoping this restrictions would yield accurate results.
The other thing to note is that this feature was working quite stable in stimuli 1.0.1.

Thanks for your help.

Question information

Language:: English Edit question

Status:: Solved

For:: SikuliX Edit question

Assignee:: No assignee Edit question

Solved by:: Vin Uppinkudru

Solved:: 2018-03-08

Last query:: 2018-03-08

Last reply:: 2018-03-07

Link existing bug

Revision history for this message

RaiMan (raimund-hocke) said on 2018-03-07:

There are 2 classes, where text recognition based on Tesseract is handled:

org.sikuli.script.TextRecognizer:
how to use it, can be seen in the method Region.text()
There is this faq 2709 telling how to switch language (be aware: the standard language is eng (english))

org.sikuli.natives.Vision
which implements the features based on the native Tesseract library and JNI (C++)
Nothing to do here except, that you can try to add Tesseract specific settings to the environment using
setParameter(String param, float val)
setSParameter(String param, String val)

About possible parameters and their values you have to consult the Tesseract docs (Tesseract 3 is used).

Working on this level, you might find a way to optimize your results, but be aware: it might be necessary, to implement your own text-read code based on the above 2 classes and their implementation.

Revision history for this message

Vin Uppinkudru (neouppin) said on 2018-03-08:

Thanks Raiman. faq 2709 helped. But there was not difference in the results. I think it is just how Tesseract is. Thanks

To post a message you must log in.

Ask a question

Edit question

SikuliX

Ocr settings to get correct text from a region

Question information

Related bugs

Related FAQ:

Subscribers