Best way to read a combobox, or any other control with unselectable content

Asked by Edmundo VN

I am with a problem that I cannot solve in a simple way:

I need to manipulate the screen of a program that have comboboxes on it, to be more specific Im using Linux and the program is a Java program executed using Java Web Start. Its a normal address screen, I choose the state and then choose the city, the list of states is not that big but the list of cities is.

I can try to use Tesseract using something like "region.right(xxx).text()" and keep reading the options until I get the one I want, the problem with Tesseract is that it misses around 8% of all tries. I could not improve that changing fonts (fortunately I can do that because the program uses TinyLaF java package and I can change its theme), it still reads some chars wrong (Im not dealing with accentuated chars here).

The city names doesn't match a 100% when I choose it, so I need to try to choose it, read what was chosen and decide if I want to stick with that choice, I can see if what I choose is exactly what I want or use SequenceMatcher from difflib and get a ratio, I will not discuss that. What Im discussing is how do I read what I choose.

1. I can try to teach Tesseract to process a specific font, I don't have sure if Tesseract can reach 100% of correct readings this way.

2. I found the program inside the Java cache and the jar file that have the text files with all the city names, I can load and process my entire country using python when my script loads, sort the lists and know whats the position on the list is the city I want and I will know exactly how it was written. Its a lot of work to choose a simple item in a combobox and I will not have that luck with other programs.

In Linux I have wmctrl that can manipulate a window but does not reach its internal controls. In AutoHotKey for Microsoft Windows it seems to have a command that do this, called ControlGetText http://www.autohotkey.com/docs/commands/ControlGetText.htm

Anyone knows if there is a way to read the text of a control inside a program without using Tesseract using python or any external tool?

Question information

Language:
English Edit question
Status:
Solved
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Solved by:
Eugene Maslov
Solved:
Last query:
Last reply:
Revision history for this message
Eugene S (shragovich) said :
#1

First of all, I hope I understood what you are asking.

I wouldn't recommend using tesseract for menu items detection. As you have mentioned yourself, it is not reliable. There are 2 more or less reliable options:

1. Create a screenshot (pattern) of every item in your combobox and then just detect it as any other Sikuli pattern.

2. In most cases, the items are more or less static, meaning that they are not expected to disappear or their order to be changed on a regular basis. So what you can do is just use keyboard actions to navigate to a specific list items. For example, if you know that a certain value is coming second in your list, you can simulate DOWN button to be pressed twice:

if item == "itemName1":
    type(Key.DOWN)
elif item == "itemName2":
    type(Key.DOWN)
    type(Key.DOWN)

and so on...

Of course, first you will have to select the combobox itself by clicking somewhere on it and making the items menu active.

Euegene

Revision history for this message
Edmundo VN (edmundo-vn) said :
#2

Thanks for the answer.

1. Overkill, moreless 5570 items.
2. Yes Im doing that, but as I said the content of the combobox is not exactly typed 100% as the postal service (where I got the city, defines it), I need to choose, I need to read and I need to decide if I stick with it.

Something easier than I said, would be process those cities that I found inside the source, don't do anything with that, only search it what I want is there (and if I don't find it, then work with matching ratios) then simply type it inside the combobox (spaces are typed with shift+space).

But as I said, its luck to be able to extract the source from the combobox from a jar file, in other programs I will not have that luck.

Revision history for this message
RaiMan (raimund-hocke) said :
#3

As far as I understand:
you are typing some characters in a field of a combobox, which selects an entry and at some time of typing, you want to verify that you want to use the selected entry.

If this is right:
If Region.text() does not work reliable (or can be made reliable, by dealing with some regular exceptions), then you have no chance with Sikuli.
even if you could use images, then you first need a routine, that maps text entries to images and for this you need to know the text entries (here the loop starts ;-)

The only idea:
be satisfied with what you can decide using Region.text() and proceed.
Is there any chance some steps later to verify your choice?
If yes, then go back and choose another entry.

Revision history for this message
Best Eugene Maslov (emaslov1) said :
#4

Edmundo,
It's possible to write your own text recognizer with Sikuli and teach it yourself.

Make little screenshots of your letters without left and right margins, and write some code transferring the characters into png file names, e.g.
"b":"b.png",
"B":"b_up.png",
":":"colon.png",
" ":"space.png"
...

Take the string you want to find.
findAll the first character of the string and put it into an array A of small regions.
Then, with each of the found regions, offset to the right for a distance a bit greater than the next letter, plus extend a bit up and down, and try to find there the second character. If not found - remove the item from A array of found items. But if it's found, then fix new right border of found region, it will be the basis for the next offset.
Then make again the offset to the right for third, fourth etc. characters, removing items where the last character is not found.
By the end of the string, you will have your string found on the screen.

Before this procedure, set AutoWaitTimeot to zero, else the search will be slow.

I use such one, it is quite reliable.

Revision history for this message
Edmundo VN (edmundo-vn) said :
#5

Thanks Eugene Maslov, that solved my question.

Revision history for this message
Edmundo VN (edmundo-vn) said :
#6

I did not used that solution, I ended up processing the source list of cities and made a translation table to simply type them inside the combobox, anyway what you suggested seems interesting.

Revision history for this message
Eugene S (shragovich) said :
#7

@Eugene

Just curious how fast/slow does that approach works for you. I tried something similar but it was painfully slow and I decided to drop it.

Thanks

Revision history for this message
Eugene Maslov (emaslov1) said :
#8

@Eugene S
It's slower than search of one simple image, but not really too slow.
The following problems can slow it down:
 - Old Sikuli 1.0.0 - it couldn't reduce AutoWaitTimeout less than 0.3 sec, in 1.1.0 it's well solved.
 - Many words in the screen, having the same beginning: it has to check many times many items and removes wrong ones from consideration quite late, so the search time increases. In this case, e.g. if many words are sorted alphabetically, it's better to start checking from the end of the string.
 - Words starting from I, 1, or l : findAll understands some long vertical line as many first symbols, and then the check of second symbols takes long.
 - Low similarity used. If it's 0.7 or less, the first pass takes too many items, and then the items are removed at longer stages because of false positive detections, therefore the time grows. I usually use 0.85-0.95 for characters and 0.99 for the space, it works well if the background is the same as in the screenshots of the characters. If different background requires reducing the similarity, then yes, then the time will grow.

I understood that Edmundo has just a few list boxes in the screen. I also usually work with screens where the text is not very dense - just textual items, menus, popups and buttons... But if there is a full page of small text, like in a book, then, I think, having found many first letters, the procedure can search for necessary string much longer than just a ready screenshot of a word.

Revision history for this message
Eugene S (shragovich) said :
#9

@Eugene Maslov

Thanks a lot for sharing the experience!
This is very interesting.

Revision history for this message
Edmundo VN (edmundo-vn) said :
#10

I don't know how you use it, but I just tested it. I restrict the region I want to recognize the text (its a line), do a findAll with a similarity of 0.99 for all the characters and sort the matches by position, the result is a phrase. It takes 4 seconds. The worst thing happens if I let a very big empty area at the right, then the space matches 30 times and it takes 10 seconds.

Revision history for this message
Edmundo VN (edmundo-vn) said :
#11

Updating, to who want to know how long it takes. It takes moreless 0.3 seconds to recognize a phrase of a combobox with 100% of accuracy.