[research] One image, multiple sizes? Automatic detection based on one image in different rendering scenarios

Asked by Geo Ena

Hi,

Even though I have experience with other automation software, I am a newbie with Sikuli.
Therefore, quick questions:

1. I understand that, when looking for images, Sikuli recognises patterns ("shapes") rather than colors, right?

2. Is it capable to recognise the same image (pattern) but different sizes, while maintaining ratio?
To make it clear, let's say I need to detect an icon. I have the biggest version of the icon.
Can I use Sikuli to "scale down" image ratio and look for the icon image (e.g search 100% to 50% size)?

I am studying tutorials now, but any help towards this purpose (2) is much appreciated.

Thank you.

Question information

Language:
English Edit question
Status:
Solved
For:
SikuliX Edit question
Assignee:
RaiMan Edit question
Solved by:
RaiMan
Solved:
Last query:
Last reply:
Revision history for this message
Best RaiMan (raimund-hocke) said :
#1

--- at 1:
the basic function used by Sikuli is OpenCV's matchTemplate() feature.
This makes a pixel by pixel compare of one image in another at even sized or larger image.
The image to search might be a plain color area (in most cases works with the latest Sikuli version).
The images are internally converted to the RGB color model ignoring alpha channel.
There is some range of similarity (between 0.0 and 1.0), meaning, that some differences in pixel intensity/color are compensated, but below 0.8-0.9 might lead to false positives.

--- at 2:
Not a feature of Sikuli currently: the probe image is only found in the searched area (e.g. the screen), if it has the same (exact) dimension in pixel (width x height).
Normally, you use a screenshot tool (e.g. Sikuli IDE), to set up the image probes (may be same image in different sizes/resolutions). In some cases it is also needed to have different image sets for different platforms, because currently Sikuli is not neutral against rendering differences for the same image on different platforms.
Generally, there are many possible approaches to solve the scaling issue starting with e.g. Java built in image processing, adding support by OpenCV or simply use an external tool like ImageMagick's convert (which is by far the easiest approach) from within your Sikuli scripts/programs.

Revision history for this message
Geo Ena (gt0872) said :
#2

I didn't expect such a quick reply :)

1. Sikuli then technically, looks to me it is a very intelligent use of OpenCV for automation purposes.
I now understand how it works. Thank you.

2. I knew about OpenCV being able to detect even moving objects within videos - much more complicated than mere resized images, but well, I never dared approach it.
I was just hoping Sikuli has some size/ratio function to help.
But I agree, using ImageMagick is simplest.
I'll probably get as many sizes as possible via ImageMagick and ask Sikuli to alert on first match.

With such helpful people, Sikuli has all chances to become top choice for automation.
I'm looking forward to learning more...

Thanks again :)

Revision history for this message
Geo Ena (gt0872) said :
#3

Thanks RaiMan, that solved my question.

Revision history for this message
RaiMan (raimund-hocke) said :
#4

Thanks for kind feedback.

--- I'll probably get as many sizes as possible via ImageMagick and ask Sikuli to alert on first match.
This is a bit tricky with Sikuli and needs some extra scripting/programming, since there currently is no Sikuli feature to search a set of images in one run.

a basic approach:

images = (image1, image2, image3, ... , imageN) # the image filenames

found = -1
for i in range( len(images) ):
     if exists( images[i], 0): found = i; break
if found < 0:
    print "no image found"
else:
    print "found image %d ( %s )"%(found, images[found])

this makes one search for every given image without waiting the standard 3 seconds if not present and stops with the first one found.
Since searching the whole screen (as in this case) might last up to 1 sec per search trial, it usually is a must to restrict the search region to a smaller area. search times can thus be reduced to 1/10.

more speed is possible if you delegate the searches to threads if needed.

Revision history for this message
j (j-the-k) said :
#5

Is there any documented successful use of imagemagick or any other way to support finding different scaled images? I would be very interested in this feature.

Revision history for this message
RaiMan (raimund-hocke) said :
#6

I have not seen yet any document, that talks about "How to use ImageMagick together with Sikuli".

So what exactly do you want to know?

e.g. How to scale one image to different sizes?

Revision history for this message
j (j-the-k) said :
#7

I use sikuli for gui automation and sikuli uses a number of images to interact with the gui.
The gui can be scaled so that icons and other elements can change their size.

If Sikuli could detect the scale (e.g. by finding a reference image) and then find all images in the right scale, this would be an interesting feature.
However this would only be interesting if it worked very simple and dependable.

Revision history for this message
RaiMan (raimund-hocke) said :
#8

All this is possible with Sikuli right now.

Since you always make Sikuli workflows with detailed knowledge about the GUI elements and their behavior, the image based risk , that your script fails, is normally due to rendering differences including scaling and differences in the environment setup, that might influence the appearence of your GUI elements.

In all these cases, where an image might have a different shape in pixels (width x height) or the image content is different, so that the Sikuli search cannot find the image any more, you need different image sets for these situations.

For the first case (different size in pixels) it is rather easy to implement your solution based on one image set, that represents the images in its largest possible version.
One of these images is prepared in different sizes, so your workflow would find out, which size is the current one. Now you could scale down all other images on the fly with this scale factor. The only caveat: your scaling approach must fit the one, that is used by the GUI.

I do not think, that you will get a general Sikuli feature for that in the near future, since this would be very system and application/browser dependent.

What might be a feature, that could be implemented rather easy: a function, that scales down an image with a given factor and a choice of the scaling algorithm from those mostly used. This could be an additional aspect of the Pattern class.

Revision history for this message
Perkins (lperkins) said :
#9

Since you're using OpenCV, why not let it do the heavy lifting?

http://docs.opencv.org/doc/tutorials/features2d/feature_homography/feature_homography.html#feature-homography

This is a more expensive operation computationally speaking, but once the object is found at its new scale/rotation Sikuli makes it easy to automatically grab its image for subsequent re-use with simple template matching. Having this be an easily accessible function would take most of the pain out of dealing with multiple resolutions and/or the variations in output between different web browsers.

Revision history for this message
RaiMan (raimund-hocke) said :
#10

@ Perkins
thanks for the hint.

Principally a feature based matching algorithm might help in some situations, where matchTemplate() cannot be used.
It might be especially useful for some "learning" phase as the one mentioned above and other cases like trying to "learn a GUI".

The above example is no longer useable for Sikuli out of the box, since the SURF and SIFT parts are no longer free without reserve since OpenCV 2.4

I have this aspect on the list for future enhancements of Sikuli.

Revision history for this message
Perkins (lperkins) said :
#11

I am far from an expert, but the documentation mentions a couple of possible substitutes.

OpponentColorDescriptorExtractor
BriefDescriptorExtractor

If I'm reading this correctly, the descriptor extractors were designed to be interchangeable. I'm afraid I don't know enough about image analysis to be certain, and I don't currently have the means to compile test code particularly easily, but it might be worth a try.

When I have more time I'll see if I can come up with a working prototype program. It will likely be quite some time, however.

Revision history for this message
RaiMan (raimund-hocke) said :
#12

@Perkins
Absolutely right.

On what system would you "compile test code"?
May be I can give you some hints on a suitable setup, that is compatible with mine.

Any contribution is always welcome.

I myself will start to work on this native enhancements (based on OpenCV and Tesseract) in the last quarter to may be get some improvements into version 1.1

Revision history for this message
Perkins (lperkins) said :
#13

I generally write in Python on Linux. I'm afraid I don't know much about Java beyond the basic syntax, but I'm told that it is possible to compile Python scripts into java, so that may be an option. The major constraint is time. I'm currently getting married in less than a month, so my schedule is rather swamped.

Revision history for this message
RaiMan (raimund-hocke) said :
#14

@Perkins

If you want to contribute anything to this area, it is totally enough, to make some Python scripts, that use OpenCV 2.4 features using the available Python interface.

The effort is mainly the research, what features of OpenCV to use, to accomplish anything in this area.

One big step would be, to have an approach for this:
- e.g. you have an image of an "OK" button
- you approximately know where in the app window this button should be
- the user has changed the GUI setup, so the button looks different or is larger with a different rendering on the pixel level (e.g. interpolated or deleted)

So the standard find() op definitely fails.

Any chance to use one of the other OpenCV features, to say: yes, I think, the button is there, but it looks different.

Another often asked feature is, to have some image (e.g. again a button) and to identify all buttons currently visible, that have the general outline of this button, but may have a different width because of a different title text.
How do I prepare such a search image (e.g. some Pattern class feature) to tell Sikuli, that during find, the inner parts should be ignored, only the left and right ending and top/bottom edge are relevant. What approach is used in terms of OpenCV.

I think all this could be done in the same way, as the OpenCV samples are setup: some probe images and a target image, to act on.
No need to use anything with Sikuli or even Java (but one might of course use Sikuli IDE, to capture the relevant images ;-)

The usage workflow of OpenCV functions would then to be transformed to Java + C++, which might do someone else.

But no need to hurry, that is not my current priority. If any positive findings get available later this year: the Sikuli world would be happy.

Anyway, all the best for your upcoming event ;-)

Revision history for this message
Josh (joshtan11) said :
#15

Hi Raimund/ Perkins,

Not sure if there has been any progress in the above matter concerning:

- e.g. you have an image of an "OK" button
- you approximately know where in the app window this button should be
- the user has changed the GUI setup, so the button looks different or is larger with a different rendering on the pixel level (e.g. interpolated or deleted)

Would appreciate if you could point me in the direction of a workaround, if any :) I have looked through alot of the FAQ without any luck finding

Revision history for this message
RaiMan (raimund-hocke) said :
#16

@Josh
there will only be better support for these situations in version 2.