How can I detect multiple languages in the same image?

Asked by Melissa 33 on 2019-04-19

I have some images where I want to detect text, but it's in many languages. They are pictures of a list of names, some russian, some japanese, some english, some korean. I was using text() for simple english only name list images so far and it worked nearly perfectly but now that I want to detect 4+ languages in one image at a time I don't know what to do.

basically my script would look at the english image and print the names for me like so:
1. jason doe
2. john doe
3. sam doe
4. mary doe

but I don't know how to make it work on multi language name list images, is it even possible to search for lots of languages in one image simultaneously?

Question information

English Edit question
Sikuli Edit question
No assignee Edit question
Last query:
Last reply:
RaiMan (raimund-hocke) said : #1

The main problem you have:
There is no feature, to guess a language. With the SikuliX text features based on Tesseract, you need to know the language, before using the features. And you have to prepare your setup for every language to be used other than eng (traineddata).

So even if you would read the text in some selected languages in parallel (I am not sure, wether the text feature is thread-safe at all currently), you still have to decide, which variant of a name to choose.

Supposing you use 1.1.4 you should have a look at the new features to read lines of text (collectLines), which gives you the regions of the lines and its read content.
Might be, that this feature helps you to get nearer to a solution.

TestMechanic (ndinev) said : #2


If you share more about the problem there may be another solution.

1. Can you copy text from those "images"? Are they pure images or they exist in web page like texts?
2. You may try OCR them in sequence for all languages until getting reasonable results

TestMechanic (ndinev) said : #3

Melissa, Sorry for misspelling your name in my previous post

Can you help with this problem?

Provide an answer of your own, or ask Melissa 33 for more information if necessary.

To post a message you must log in.