[2.0.5] OCR: does not work with a username containing non-ASCII

Asked by Máté Bálint

Problem tracked on Github: https://github.com/RaiMan/SikuliX1/issues/470


Is there a way to use the OCR from file? If yes, can somebody give me a code example for Java?
Im using Java 1.8.

(My goal is to use OCR in background)

Thank you,


Question information

English Edit question
SikuliX Edit question
No assignee Edit question
Last query:
Last reply:
Revision history for this message
RaiMan (raimund-hocke) said :

What do you mean by file?
OCR is done against images (might come from a file)?

You might run a tesseract command or a Java Runnable in a subprocess.

Why background? What runs in foreground?

Revision history for this message
Máté Bálint (zodey) said :

By file I mean PNG or JPG or JPEG etc... (So its dowloaded)
My bot is running background and I want the OCR in background as well.


Revision history for this message
RaiMan (raimund-hocke) said :

Is SikuliX on class path?

if yes, what version?

If not, have a look at Tess4J (used with SikuliX) or (as mentioned) use a tesseract command in a subprocess.

Revision history for this message
Máté Bálint (zodey) said :

I dont know what is class path, but I added it in InteliJ if thats it. The latest version (2.0.5).

Revision history for this message
RaiMan (raimund-hocke) said :

So you seem to be a newbie with Java???

Did you write the bot yourself or is it a fork from somewhere?

I ask this, because SikuliX features cannot really be used in "background", which means, while a SikuliX workflow is running, you cannot use the machine for something else.

Revision history for this message
Máté Bálint (zodey) said :

I wrote the bot with Selenium. Well, I think I just left the beginner state. Im on the part when I learn new thinks easily, just like SikuliX (I mean there is no yt tutorial for SikuliX but I can understand it from the Documentation). Now currently I am learning OOP, I know method overloading, constructor overloading, ecaputulation (at least something like that, I'm on mobile idk how to write it correctly ;D), method chaining, I understand static (can be used without creating a new object), etc... The reason for not understand those keywords is I learn Java in my native language (hungarian). And I am 16 years old if you want to know so Im a bit bad in English.

Revision history for this message
Máté Bálint (zodey) said :
Revision history for this message
RaiMan (raimund-hocke) said :

Ok, wish you all the best.

In SikuliX class Image has a static convenience method wrapping features of class OCR:

String textRead = Image.text("path-of-imagefile.png")

Revision history for this message
Máté Bálint (zodey) said :

Thank you, but I need some help again.
I wrote a simple app (just print the textRead string), but an exception come up.

Error opening data file C:\Users\Bálint István\AppData\Roaming\Sikulix\SikulixTesseract\tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Exception in thread "main" java.lang.Error: Invalid memory access
        at net.sourceforge.tess4j.TessAPI1.TessBaseAPIGetUTF8Text(Native Method)
        at net.sourceforge.tess4j.Tesseract1.getOCRText(Tesseract1.java:497)
        at net.sourceforge.tess4j.Tesseract1.doOCR(Tesseract1.java:303)
        at net.sourceforge.tess4j.Tesseract1.doOCR(Tesseract1.java:276)
        at net.sourceforge.tess4j.Tesseract1.doOCR(Tesseract1.java:257)
        at net.sourceforge.tess4j.Tesseract1.doOCR(Tesseract1.java:241)
        at org.sikuli.script.TextRecognizer.doRead(TextRecognizer.java:376)
        at org.sikuli.script.TextRecognizer.readText(TextRecognizer.java:335)
        at org.sikuli.script.OCR.readText(OCR.java:710)
        at org.sikuli.script.OCR.readText(OCR.java:695)
        at org.sikuli.script.Image.text(Image.java:1381)
        at hu.webnode.zodey.App.main(App.java:8)

How can i set the " TESSDATA_PREFIX environment variable" to the "tessdata" directory?

Revision history for this message
RaiMan (raimund-hocke) said :

Then you have to check your setup.

I have this simple main:

package com.sikulix.sikulixrun;

import org.sikuli.script.App;
import org.sikuli.script.Image;
import org.sikuli.script.support.RunTime;

public class SXTest {

    public static void main(String[] args) {

        final String text = Image.text("C:\\Users\\rmhde\\SikuliX\\_205\\img.png");



... and in my IDEA project I have in the Project Settings -> Modules -> Dependencies a pointer to the location where I have downloaded the SikuliX IDE jar.

Everything works as expected.

Revision history for this message
Máté Bálint (zodey) said :

Strange, I did everything as you wrote it down. Before, I used Maven with VSCode but I downloaded InteliJ for it, set everything, and still the same error. I think I need to add something to my path but Im not sure.

Revision history for this message
RaiMan (raimund-hocke) said :

Nothing extra needed on path when using IDEA and SikuliX this way.

Delete the folder
C:\Users\Bálint István\AppData\Roaming\Sikulix

and try again.

Another test might be to delete the mentioned folder, start the SikuliX IDE from command line and check wether this works:
print selectRegion().text()

... at script run select an area containing text

Revision history for this message
Máté Bálint (zodey) said :

I created a video of it: https://streamable.com/89w4v3

[error] script [ asd ] stopped with error in line 1
[error] java.lang.Error ( java.lang.Error: Invalid memory access )
[error] --- Traceback --- error source first
line: module ( function ) statement
1: main ( <module> ) print selectRegion().text()
[error] --- Traceback --- end --------------

Revision history for this message
RaiMan (raimund-hocke) said :

Again delete the folder
C:\Users\Bálint István\AppData\Roaming\Sikulix

run the IDE as before from command line, but add -v -c as parameters

as script use
print SCREEN
r = selectRegion()
print r.text()

and send me the output you get on command line to may mail shown at https://github.com/RaiMan

Revision history for this message
RaiMan (raimund-hocke) said :

user got a workaround.

accepted as bug and tracked on GitHub.