Accessing different language pack using Tesseract

Asked by dinesh

Hello

I tried downloading the sikuli 1.1.0 from this page http://nightly.sikuli.de/ but still I am not getting option to set the language using Setting.OcrLanguage.

Under Settings i could see only Settings.OcrTextSearch,Settings.OcrTextRead .

More over, the import org.sikuli.script.TextRecognizer.reset()...is not working as it was saying 'The method reset() is undefined for the type TextRecognizer'.

My scenario is:

(i)I need to run my test in using different language pack such as eng and swedish.

(ii) Also I observed even the english characters(lowercase) are not read exactly as how it is given....sometime sikuli reads 'c' as 'o' and 'i' as 'l'. In there is any workaround to read the characters exactly as it is present in the screen.

Kindly help us is this regard.

Question information

Language:
English Edit question
Status:
Answered
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
RaiMan (raimund-hocke) said :
#1

--- Under Settings i could see only Settings.OcrTextSearch,Settings.OcrTextRead .
where did you look?

here?: http://nightly.sikuli.de/docs/index.html

this should work:

Settings.OcrLanguage = "otherLanguage"

import org.sikuli.script.TextRecognizer as TR
TR.reset()

otherLanguage must be according to the rules of Tesseract and the stuff should be in folder
<SikulixAppData>/SikulixTesseract/tessdata

SikulixAppData: http://sikulix-2014.readthedocs.org/en/latest/basicinfo.html#some-general-aspects

--- even the english characters(lowercase) are not read exactly
this might well be depending on font characteristics --- you have to live with it

... or produce traineddata using the Tesseract tools and apply them to SikulixAppData

... but still for all thees preparation steps there is no special support in SikuliX.

Revision history for this message
dinesh (dperomy) said :
#2

Hi RaiMan,

Thanks for your quick response.

After adding the latest version of the sikuli 1.1.0 I am getting the following error while running the test...

[error] TextRecognizer not working: tessdata stuff not available at:..../Library/Application Support/Sikulix/SikulixTesseract/tessdata.

I didn't get any such error while running the older version and this is happening after adding the new sikuli jar with TextRecongnizer...

Actually we are not running our test locally...we kept all our jars and libs files in the central repository. In order to use the new version I just added the new jar to the existing library(i.e) replaced the old jar with the new jar

Am i doing anything wrong with the setup?.

Kindly assist me in the regards.

Regards
Dinesh

Revision history for this message
RaiMan (raimund-hocke) said :
#3

answering in new question you posted

Can you help with this problem?

Provide an answer of your own, or ask dinesh for more information if necessary.

To post a message you must log in.