Ocr settings to get correct text from a region

Asked by Vin Uppinkudru

Hi Raiman,
Apologies, if this question is a sort of repeat.
Stimuli version 1.1.1
Language - Java
I am using get text on a region and it's working ok most of the times.but some times it's giving me correct text.
For example
Number 1 is read as "'I".
Looks like some non English character.
How can Ser the language to only English.
And text recognition to only Alphabets (A-Z, a-z) and Numbers(0..9).

I am hoping this restrictions would yield accurate results.
The other thing to note is that this feature was working quite stable in stimuli 1.0.1.

Thanks for your help.

Question information

Language:
English Edit question
Status:
Solved
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Solved by:
Vin Uppinkudru
Solved:
Last query:
Last reply:
Revision history for this message
RaiMan (raimund-hocke) said :
#1

There are 2 classes, where text recognition based on Tesseract is handled:

org.sikuli.script.TextRecognizer:
how to use it, can be seen in the method Region.text()
There is this faq 2709 telling how to switch language (be aware: the standard language is eng (english))

org.sikuli.natives.Vision
which implements the features based on the native Tesseract library and JNI (C++)
Nothing to do here except, that you can try to add Tesseract specific settings to the environment using
setParameter(String param, float val)
setSParameter(String param, String val)

About possible parameters and their values you have to consult the Tesseract docs (Tesseract 3 is used).

Working on this level, you might find a way to optimize your results, but be aware: it might be necessary, to implement your own text-read code based on the above 2 classes and their implementation.

Revision history for this message
Vin Uppinkudru (neouppin) said :
#2

Thanks Raiman. faq 2709 helped. But there was not difference in the results. I think it is just how Tesseract is. Thanks