How to capture image of microsoft doc and read the text?

Asked by Gayatri on 2018-07-27

Hi I am new to Sikuli as I am using it for test automation. I managed to open the doc. I want to search for some text from doc. So far, from reading other questions, I have learned that Sikuli used tesseract based OCR. but how to go forward with it?

I understand that it uses region to capture the picture then read from it. This is my codes below but it doesn't work.

click("1532589375522.png")
click("1532589388724.png")
doubleClick("docdocx-1.png")
Region = Region( Region(451,131,1000,872))
Region.text()
print(text)

Message:
[log] CLICK on L(579,1054)@S(0)[0,0 1920x1080] (571 msec)

[log] CLICK on L(105,311)@S(0)[0,0 1920x1080] (535 msec)

[log] DOUBLE CLICK on L(420,326)@S(0)[0,0 1920x1080] (593 msec)

<bound method org.sikuli.script.Screen.text of S(0)[0,0 1920x1080] E:Y, T:3.0>

Thank you xoxo

Question information

Language:
English Edit question
Status:
Answered
For:
Sikuli Edit question
Assignee:
No assignee Edit question
Last query:
2018-07-30
Last reply:
2018-07-30
RaiMan (raimund-hocke) said : #1

you need some basic Python knowledge for scripting in the IDE:

reg = Region(451,131,1000,872)
text = reg.text()
print text

- variable names should be different than already existing names (like Region - which is a class name of SikuliX)
- results of function calls (like reg.text()) should be assigned to variables for reuse (like in print text)

OCR with SikuliX: be aware of many caveats especially recognition quality.

Gayatri (sangasangasanga) said : #2

Hey thank you for answering. It works but the messages shows as below. What does that suppose to mean?

[log] CLICK on L(580,1054)@S(0)[0,0 1920x1080] (556 msec)

[log] CLICK on L(934,420)@S(0)[0,0 1920x1080] (555 msec)

[log] DOUBLE CLICK on L(1260,425)@S(0)[0,0 1920x1080] (594 msec)

Share
Vle
Home
l)’
Move COP)’
to
Cut
Copy path
Paste ShOI’tCUt
sago
Delete Rename
Pin to Quick
COP)’
Paste
ew
folder
to
EICCESS
Clipboard
New
Organize
This PC
Downloads
utes ago
> ':. 3D objects
> in Desktop
> E Documents
> 5 Downloads
> Lb Music
> ‘ T3Pictur es
Today (4)
ie for scripting in the IDE
df

RaiMan (raimund-hocke) said : #3

This is the text, SikuliX reads inside the given region.

Gayatri (sangasangasanga) said : #4

@raimund-hocke
but thats not what's inside the given region. It gives me the file name but what is inside the region is some sentences like apple

This is my codes:
click()
click()
doubleClick()
reg = Region ( Region(499,201,329,201) )
text = reg.text()
print text

Message:
[log] CLICK on L(581,1054)@S(0)[0,0 1920x1080] (534 msec)

[log] CLICK on L(105,311)@S(0)[0,0 1920x1080] (529 msec)

[log] DOUBLE CLICK on L(422,329)@S(0)[0,0 1920x1080] (560 msec)

docText.doc

Gayatri (sangasangasanga) said : #5

click("1532589375522.png")
click("1532589388724.png")
doubleClick("testingdocx-1.png")
reg = Region(499,201,329,201)
text = reg.text()
print text
Message:
[log] CLICK on L(581,1054)@S(0)[0,0 1920x1080] (542 msec)

[log] CLICK on L(105,311)@S(0)[0,0 1920x1080] (536 msec)

[log] DOUBLE CLICK on L(422,329)@S(0)[0,0 1920x1080] (546 msec)

ok so it's not printing? But i specify the region

RaiMan (raimund-hocke) said : #6

not sure, but I guess, this is to open the document:
click("1532589375522.png")
click("1532589388724.png")
doubleClick("testingdocx-1.png")

if yes, then you have to wait until the doc is really visible on the screen.
either with a timed wait or by waiting for some image, that signals that the doc is readable on the screen.

... then
reg = Region(499,201,329,201)

is the region on the screen containing the text to be read.
use this while testing:
reg = Region(499,201,329,201)
reg.highlight(2)

so you can see, wether your region is really, what you want.

For testing, what OCR can really do for you I suggest, to open the document manually and then run a script like so:
reg = selectRegion() # manually select the region
text = reg.text()
print text

For options, to get the document window as region look at the App class features.

Can you help with this problem?

Provide an answer of your own, or ask Gayatri for more information if necessary.

To post a message you must log in.