X 1.0rc1 - Inconsistency in text recognition and matching, especially with integers-as-text!

Bug #695616 reported by Nash
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
SikuliX
In Progress
Low
RaiMan

Bug Description

I have noticed inconsistency in the Sikuli X text recognition and matching, especially when text on the screen/region are integers. Below are the steps to reproduce the problem scenario:

Step1: In your web browser open 'TextOnTheScreen.png' from the following web-link (http://qoydkw.bay.livefilestore.com/y1pSr0oDju9ndvMzJJfo_qWD2qEvMdDDwoKquMNWWkUYAxgrsVcXmETY8yW_LqIieOKXgwlTt8yxe8eQDY-SCySCFljo59mSSIC/TextOnTheScreen.png?psid=1)

Step2: To the below sikuli script, provide region coordinates of the 'TextOnTheScreen.png' image which is on your web browser. Note! region coordinates must be significantly bigger because you will zoom-in & zoom-out of the visible content on your browser in-order to carry-out multiple text recognition tests.
--------------------------------------------------------
r = Region(“TextOnTheScreen”)
t1 = "534438"
t2 = "534177"
t3 = "4438"
t = t2
m = r.exists(t,0)
if m:
if m.text() == t:
 popup("Exact match")
else:
 popup(m.nearby().text())
else:
popup("Nothing found at all!")
--------------------------------------------------------

Step3(TestCase1): Now, run the above sikuli script with t = t2. You might notice one or more of the following:
- Integers such as 7 & 8 are incorrectly recognised
- All the integers-as-text are recognised correctly
- Only few integers are recognised correctly

Questions:
- Does the text recognition depend on how much OCR gets trained?
- Does the font-type & font-size matter?
- Should there be a mechanism for validating an exact-match for text recognition?

Step4(TestCase2): Now, back to your browser where the 'TextOnTheScreen.png' image is visible. Try to zoom-out (ctrl&-) the image and run the above sikuli script with t = t2. You should now notice variation in the text recognition results.

Step5(TestCase3): Repeat 'TestCase2' with different zoom-levels in your browser, you might get varied text recognisation results.

Step6(TestCase4): Now, run the above sikuli script with t = t3. You might notice that integers-as-text are not recognised at all or with luck it might recognise some integers-as-text?

Hope this provide some test-cases for troubleshooting inconsistency in text/integer recognition.

Questions:
- Should text and integer recognition be independent operations? Should there be a choice for the application developer if she/he wants to combine and/or isolate such operations where-seen-fit? Would such separation lead to reduced-OCR-complexity/accuracy/better-control over the OCR?
- How would one reliably obtain an exact string match for e.g. with click("String") operation?

Tags: fkt-text
Nash (knshetty-live)
tags: added: and exact integers-as-text match matching recognition string text text+integer
Nash (knshetty-live)
description: updated
tags: removed: and integers-as-text matching string text
RaiMan (raimund-hocke)
summary: - Sikuli X 1.0rc1 - Inconsistency in text recognition and matching,
- especially with integers-as-text!
+ X 1.0rc1 - Inconsistency in text recognition and matching, especially
+ with integers-as-text!
Revision history for this message
bevan dequeker (sikuli-v) wrote :

It seems particularly vulnerable to integers-only, with no spaces
e.g
"00 1 002003004" is seen as "001 002003004"
- almost perfect
but "01002003004" is seen as "'{ri[?I?)'?1?I?H?!?E!"
(same app, same font, same color)

RaiMan (raimund-hocke)
Changed in sikuli:
status: New → In Progress
importance: Undecided → Low
assignee: nobody → RaiMan (raimund-hocke)
tags: added: fkt-text
removed: exact match recognition text+integer
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.