How to improve OCR accuracy or give it a font

Asked by Kamil

The OCR for SikuliX is not the best.. 20% of the scans I do are incorrect. Would there be a way where I can choose the font the game is using to improve accuracy? Or even select individual characters (a, b, c etc) So when it sees that character, it knows what it is? Or any other way to improve OCR accuracy? Also, would there be a way to use an online OCR with sikuli (https://www.newocr.com/)

In short, how can I improve OCR accuracy or use an online OCR instead with Sikuli?

Question information

Language:
English Edit question
Status:
Answered
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
RaiMan (raimund-hocke) said :
#1

You might use the Tesseract 3 features to learn the font and then add the resulting files to Sikuli's tessdata folder.

To use the web-site, you have to store your image and create some automation, to control the web workflow.

Revision history for this message
Kamil (k-danek) said :
#2

I don't know anything about Tesseract. Could you tell me how I can make it learn a font? Also, where is Sikuli tessdata folder usually located? I also don't think l'll go with online OCR because it will take too much time.

Revision history for this message
RaiMan (raimund-hocke) said :
#3

to learn more about Tesseract (also used by the mentioned website) look here:
https://github.com/tesseract-ocr/tesseract

If you have installed SikuliX with Tesseract/OCR support, the tessdata folder location is shown by:
print Settings.OcrDataPath.

I cannot help you further, since I do not have any experiences wit that. The text/OCR stuff is unchanged since 5 years and still as it was left by the original developers. You do not have any further options besides trying to use optimized regions.

You might have a chance to find more about that using google (Sikuli + tesseract), since I know that some people went this way.
... e.g. this (sikulix tesseract training) revealed that:
https://answers.launchpad.net/sikuli/+question/273729

Another option for you might be to install Tesseract and use the command in a subprocess, to get the text from an image, after having it preprocessed according to the Tesseract quality prerequisites.

Revision history for this message
RaiMan (raimund-hocke) said :
#4

Can you help with this problem?

Provide an answer of your own, or ask Kamil for more information if necessary.

To post a message you must log in.