SikuliX

How to improve OCR accuracy or give it a font

Asked by Kamil on 2016-09-24

The OCR for SikuliX is not the best.. 20% of the scans I do are incorrect. Would there be a way where I can choose the font the game is using to improve accuracy? Or even select individual characters (a, b, c etc) So when it sees that character, it knows what it is? Or any other way to improve OCR accuracy? Also, would there be a way to use an online OCR with sikuli (https://www.newocr.com/)

In short, how can I improve OCR accuracy or use an online OCR instead with Sikuli?

Question information

Language:: English Edit question

Status:: Answered

For:: SikuliX Edit question

Assignee:: No assignee Edit question

Last query:: 2016-09-24

Last reply:: 2016-09-25

Link existing bug

Revision history for this message

RaiMan (raimund-hocke) said on 2016-09-24:

You might use the Tesseract 3 features to learn the font and then add the resulting files to Sikuli's tessdata folder.

To use the web-site, you have to store your image and create some automation, to control the web workflow.

Revision history for this message

Kamil (k-danek) said on 2016-09-25:

I don't know anything about Tesseract. Could you tell me how I can make it learn a font? Also, where is Sikuli tessdata folder usually located? I also don't think l'll go with online OCR because it will take too much time.

Revision history for this message

RaiMan (raimund-hocke) said on 2016-09-25:

to learn more about Tesseract (also used by the mentioned website) look here:
https://github.com/tesseract-ocr/tesseract

If you have installed SikuliX with Tesseract/OCR support, the tessdata folder location is shown by:
print Settings.OcrDataPath.

I cannot help you further, since I do not have any experiences wit that. The text/OCR stuff is unchanged since 5 years and still as it was left by the original developers. You do not have any further options besides trying to use optimized regions.

You might have a chance to find more about that using google (Sikuli + tesseract), since I know that some people went this way.
... e.g. this (sikulix tesseract training) revealed that:
https://answers.launchpad.net/sikuli/+question/273729

Another option for you might be to install Tesseract and use the command in a subprocess, to get the text from an image, after having it preprocessed according to the Tesseract quality prerequisites.

Revision history for this message

RaiMan (raimund-hocke) said on 2016-09-25:

a nice toy:
http://www.lmdfdg.com/?q=sikulix+tesseract+training

Can you help with this problem?

Provide an answer of your own, or ask Kamil for more information if necessary.

To post a message you must log in.

Ask a question

Edit question

SikuliX

How to improve OCR accuracy or give it a font

Question information

Related bugs

Related FAQ:

Can you help with this problem?

Subscribers