SikuliX

New (norwegian) tesseract training set crashes Sikuli?

Asked by Audun Mathias Øygard on 2011-06-09

Hi,

I'm having some problems getting a tesseract training set for norwegian to work in Sikuli.

The training set was created for tesseract 2.04 as described here:
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract2

My training set works with tesseract, but when I exchange the english training set in the sikuli-script.jar with my training set, Sikuli crashes whenever I try to do image captures or try to get the text in an image. Since my training set includes non-english characters (æ,ø,å), I was wondering if this is the reason Sikuli crashes. Or is there another "proper" way of doing it?

The files I've exchanged (with identically named files) are:
/tessdata/eng.freq-dawg
/tessdata/eng.inttemp
/tessdata/eng.normproto
/tessdata/eng.pffmtable
/tessdata/eng.unicharset
/tessdata/eng.user-words
/tessdata/eng.word-dawg
/tessdata/eng.DangAmbigs

Happens on Sikuli X.RC2 on both ubuntu and windows vista.

Question information

Language:: English Edit question

Status:: Solved

For:: SikuliX Edit question

Assignee:: No assignee Edit question

Last query:: 2011-06-09

Last reply:: 2011-06-10

Revision history for this message

Audun Mathias Øygard (amoygard) said on 2011-06-09:

I can verify that my training set works when I only use english characters, so I assume it's something to do with non-english characters/UTF-8.

Another issue that came up is that Sikuli is not properly handling characters which are (correctly) detected as bold by my training set. Bold characters are output with an @-symbol in front of them, so a bold a would be "@a", but Sikuli only outputs the @-symbol.

I guess I'll have to remove detection of bold characters from my training set, but it would be nice if it worked.

Revision history for this message

RaiMan (raimund-hocke) said on 2011-06-10:

Thanks for information and evaluation.

I will make it a request bug and add this to the OCR summary bug 710586

Revision history for this message

RaiMan (raimund-hocke) said on 2011-06-10:

now bug 795391

To post a message you must log in.

Ask a question

Edit question

SikuliX

New (norwegian) tesseract training set crashes Sikuli?

Question information

Related bugs

Related FAQ:

Subscribers