[2.0.5] OCR: does not work with a username containing non-ASCII

Asked by Máté Bálint

Problem tracked on Github: https://github.com/RaiMan/SikuliX1/issues/470
---------------------------------------------------------------------

Hello!

Is there a way to use the OCR from file? If yes, can somebody give me a code example for Java?
Im using Java 1.8.

(My goal is to use OCR in background)

Thank you,

Zodey

Question information

Language:
English Edit question
Status:
Solved
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
RaiMan (raimund-hocke) said :
#1

What do you mean by file?
OCR is done against images (might come from a file)?

You might run a tesseract command or a Java Runnable in a subprocess.

Why background? What runs in foreground?

Revision history for this message
Máté Bálint (zodey) said :
#2

Hello!
By file I mean PNG or JPG or JPEG etc... (So its dowloaded)
My bot is running background and I want the OCR in background as well.

Thanks,
Zodey

Revision history for this message
RaiMan (raimund-hocke) said :
#3

Is SikuliX on class path?

if yes, what version?

If not, have a look at Tess4J (used with SikuliX) or (as mentioned) use a tesseract command in a subprocess.

Revision history for this message
Máté Bálint (zodey) said :
#4

I dont know what is class path, but I added it in InteliJ if thats it. The latest version (2.0.5).

Revision history for this message
RaiMan (raimund-hocke) said :
#5

So you seem to be a newbie with Java???

Did you write the bot yourself or is it a fork from somewhere?

I ask this, because SikuliX features cannot really be used in "background", which means, while a SikuliX workflow is running, you cannot use the machine for something else.

Revision history for this message
Máté Bálint (zodey) said :
#6

I wrote the bot with Selenium. Well, I think I just left the beginner state. Im on the part when I learn new thinks easily, just like SikuliX (I mean there is no yt tutorial for SikuliX but I can understand it from the Documentation). Now currently I am learning OOP, I know method overloading, constructor overloading, ecaputulation (at least something like that, I'm on mobile idk how to write it correctly ;D), method chaining, I understand static (can be used without creating a new object), etc... The reason for not understand those keywords is I learn Java in my native language (hungarian). And I am 16 years old if you want to know so Im a bit bad in English.

Revision history for this message
Máté Bálint (zodey) said :
#7
Revision history for this message
RaiMan (raimund-hocke) said :
#8

Ok, wish you all the best.

In SikuliX class Image has a static convenience method wrapping features of class OCR:

String textRead = Image.text("path-of-imagefile.png")

Revision history for this message
Máté Bálint (zodey) said :
#9

Thank you, but I need some help again.
I wrote a simple app (just print the textRead string), but an exception come up.

Error opening data file C:\Users\Bálint István\AppData\Roaming\Sikulix\SikulixTesseract\tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Exception in thread "main" java.lang.Error: Invalid memory access
        at net.sourceforge.tess4j.TessAPI1.TessBaseAPIGetUTF8Text(Native Method)
        at net.sourceforge.tess4j.Tesseract1.getOCRText(Tesseract1.java:497)
        at net.sourceforge.tess4j.Tesseract1.doOCR(Tesseract1.java:303)
        at net.sourceforge.tess4j.Tesseract1.doOCR(Tesseract1.java:276)
        at net.sourceforge.tess4j.Tesseract1.doOCR(Tesseract1.java:257)
        at net.sourceforge.tess4j.Tesseract1.doOCR(Tesseract1.java:241)
        at org.sikuli.script.TextRecognizer.doRead(TextRecognizer.java:376)
        at org.sikuli.script.TextRecognizer.readText(TextRecognizer.java:335)
        at org.sikuli.script.OCR.readText(OCR.java:710)
        at org.sikuli.script.OCR.readText(OCR.java:695)
        at org.sikuli.script.Image.text(Image.java:1381)
        at hu.webnode.zodey.App.main(App.java:8)

How can i set the " TESSDATA_PREFIX environment variable" to the "tessdata" directory?

Revision history for this message
RaiMan (raimund-hocke) said :
#10

Then you have to check your setup.

I have this simple main:

package com.sikulix.sikulixrun;

import org.sikuli.script.App;
import org.sikuli.script.Image;
import org.sikuli.script.support.RunTime;

public class SXTest {

    public static void main(String[] args) {

        final String text = Image.text("C:\\Users\\rmhde\\SikuliX\\_205\\img.png");
        System.out.println(text);

        System.exit(0);

    }
}

... and in my IDEA project I have in the Project Settings -> Modules -> Dependencies a pointer to the location where I have downloaded the SikuliX IDE jar.

Everything works as expected.

Revision history for this message
Máté Bálint (zodey) said :
#11

Strange, I did everything as you wrote it down. Before, I used Maven with VSCode but I downloaded InteliJ for it, set everything, and still the same error. I think I need to add something to my path but Im not sure.

Revision history for this message
RaiMan (raimund-hocke) said :
#12

Nothing extra needed on path when using IDEA and SikuliX this way.

Delete the folder
C:\Users\Bálint István\AppData\Roaming\Sikulix

and try again.

Another test might be to delete the mentioned folder, start the SikuliX IDE from command line and check wether this works:
print selectRegion().text()

... at script run select an area containing text

Revision history for this message
Máté Bálint (zodey) said :
#13

I created a video of it: https://streamable.com/89w4v3

[error] script [ asd ] stopped with error in line 1
[error] java.lang.Error ( java.lang.Error: Invalid memory access )
[error] --- Traceback --- error source first
line: module ( function ) statement
1: main ( <module> ) print selectRegion().text()
[error] --- Traceback --- end --------------

Revision history for this message
RaiMan (raimund-hocke) said :
#14

Again delete the folder
C:\Users\Bálint István\AppData\Roaming\Sikulix

run the IDE as before from command line, but add -v -c as parameters

as script use
Debug.on(3)
print SCREEN
r = selectRegion()
print r.text()

and send me the output you get on command line to may mail shown at https://github.com/RaiMan

Revision history for this message
RaiMan (raimund-hocke) said :
#15

user got a workaround.

accepted as bug and tracked on GitHub.