Text find not working reliably

Asked by Gavin B on 2018-10-09

I am investigating Sikulix, and am starting with a simple scripting operation against a Web app. I am on a Windows 10 based machine and all is fine with image based wait/find/click, but in text mode things appear more complex. It works in some places, but not in others, and I'm not seeing any particular pattern. Sometimes a find will work, and then after some script runs will start to fail. Changing monitor doesn't affect my script run for images, but does for text - text finds that worked fine when my laptop was on a remote monitor don't work for the laptop's own display, but image finds are fine. If anything I'd have expected the images to be more affected.

At this point I am just using very simple findText("word or two") calls.

1) Any advice on how to make text extraction more robust?

2) Is there some way to debug so I can get some insight to what's going wrong?

Question information

Language:
English Edit question
Status:
Answered
For:
Sikuli Edit question
Assignee:
No assignee Edit question
Last query:
2018-10-11
Last reply:
2018-10-12
RaiMan (raimund-hocke) said : #1

Using version 1.1.4?

If not, than try with that and read:
https://sikulix-2014.readthedocs.io/en/latest/news.html

Gavin B (gavin-brebner-orange) said : #2

Yes - I'm on the latest 1.1.4

RaiMan (raimund-hocke) said : #3

My experiences with the new Tesseract implementation based on Tess4J in 1.1.4 is very positiv.

Generally: If you did not change anything in your script context, then something changed on the screen, which might be:
- background
- font
- anti-aliasing
- different rendering

... which obviously is the case switching between external and built-in monitor.

To get a feeling about the results produced by OCR, you can use the features
https://sikulix-2014.readthedocs.io/en/latest/region.html#extracting-text-from-a-region

run this one-line-script in the IDE:
print selectRegion().text()

which lets you select the region of interest and reads the contained text.

the features collect... might be helpful as well.

If you like, you can send me screenshots containing the relevant text and tell me what you are looking for.
sikulix---at---outlook---dot---com

Gavin B (gavin-brebner-orange) said : #4

Looks like fonts make a big difference. Large bold characters seem to work OK, but less prominent text e.g.
hints in input boxes are not found.

I tried the print selectRegion.text(), but unless I am very careful to exclude some of the images etc. on the page, I get :

```
[error] script [ HPEHome ] stopped with error at line --unknown--
[error] Error caused by: Traceback (most recent call last): File "C:\Users\brebner\sikulix\HPEHome.sikuli\HPEHome.py", line 58, in <module> main() File "C:\Users\brebner\sikulix\HPEHome.sikuli\HPEHome.py", line 53, in main people_finder() File "C:\Users\brebner\sikulix\HPEHome.sikuli\HPEHome.py", line 24, in people_finder print(selectRegion().text()) UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 83: ordinal not in range(128)
```

RaiMan (raimund-hocke) said : #5

the text features return Unicode strings.

the Python print statement cannot handle them.

try with
uprint("some Unicode text")

… mind the brackets - it is a function!

Can you help with this problem?

Provide an answer of your own, or ask Gavin B for more information if necessary.

To post a message you must log in.