Inconsistency in Match.text() - OCR related

Asked by Nanni Sunil

I tried this as a follow up to https://answers.launchpad.net/sikuli/+question/188070

The scenario is to,

1. Extract text from an image stored in file system.

I tried the following.

f = Finder("/Users/nanni/test-sikuli/source.png");

f.find("/Users/nanni/test-sikuli/to-extract.png");

while (f.hasNext()):
    #This doesn't print from matched image.
    m = f.next();
    print m; # Match[31,83 533x54 score=1.00 target=center]

    # When i try the following, it tries to find the text which is on the screen, instead I would want to extract text from the image.
    print m.text(); # uses the co-ordinates and considers it as SCREEN co-ordinates. Then it applies to SCREEN instead of my image and extracts text.

I guess REGION always considers SCREEN co-ordinates to extract the text.

Is it so ?
Is there a way to extract text from image stored on file system ?

Question information

Language:
English Edit question
Status:
Solved
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Solved by:
Nanni Sunil
Solved:
Last query:
Last reply:
Revision history for this message
RaiMan (raimund-hocke) said :
#1

no this is not so, this is a bug.

use instead:

print Region(m).text()

Revision history for this message
Nanni Sunil (sunil-jayaprakash) said :
#2

Actually,

Both m.text() and Region(m).text are producing the same result and both are acting on SCREEN instead of the match.

Here was the sample script.
========================
print "starting";

f = Finder("/Users/nanni/test-sikuli/source.png");
f.find("/Users/nanni/test-sikuli/to-extract.png");

while (f.hasNext()):

    m = f.next();

    print "From Region"
    print Region(m).text();
    print "From Match"
    print m.text();
    print "exists";

print "done";
========================

Output:
========================
starting

Found. Trying to extract Text.

From Region
Question#188121
~

From Match
Question#188121
~
done
========================
In both case, Question#188121 was present on SCREEN but not in "/Users/nanni/test-sikuli/source.png".

Is this expected ? If not, i could file a bug.

I am still unable to identify a mechanism to extract text from image on file system.

Revision history for this message
RaiMan (raimund-hocke) said :
#3

Sorry, you are absolutely right with your first finding:
I guess REGION always considers SCREEN co-ordinates to extract the text.

And so do Match (Region subclass) and Location. Sorry for the fast but wrong answer.

If you want to have a solution for that, you must step down to the Java API and work directly on buffered images (which is possible from the Sikuli script level since it is Jython - Java API: http://sikuli.org/doc/java-x/).

If you want to post a request bug, it might be to ask for the possibility, to have a stored/buffered image as a "virtual screen".

Revision history for this message
Nanni Sunil (sunil-jayaprakash) said :
#4

RaiMan, thanks for the inputs. I have raised a bug for Virtual Screen support. Also, as you said, i was able to achieve the functionality through Java api using BufferedImage.

Revision history for this message
RaiMan (raimund-hocke) said :
#5

Fine. Could you post an example code snippet, how to do this with BufferedImage.
Would help me and may be others.

Revision history for this message
Nanni Sunil (sunil-jayaprakash) said :
#6
Revision history for this message
RaiMan (raimund-hocke) said :
#7

really great. Thanks

Revision history for this message
JonyGreen (jonygreen) said :
#8

You can try this free online ocr http://www.online-code.net/ocr.html to extract text from image.