Multi-Part Pattern: Sikuli + Regex = Sikex (searching simultanouesly for more than one image)

Asked by Josh

This is not so much a question as an idea that I've been exploring the last few months. Sikuli has an awesome gui matching algo, which is useful in simple gui matching but when you get into more complicated situations.. It's pretty lame.

For Sikuli to be really precise in what's it's matching some higher level logic needs to be built into it. Basically take what Regular Expressions are for string matching and translate that into image matching. You can create a sikuli expression that return a match if matches all the logic in the expression. Include AND/OR, Postive/Negative look arounds, etc.

Not asking this is something that is natively built into Sikuli but maybe just using Sikuli as the engine.

Question information

Language:
English Edit question
Status:
Solved
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Solved by:
Josh
Solved:
Last query:
Last reply:
Revision history for this message
RaiMan (raimund-hocke) said :
#1

This was planned "a long time ago" to implement in the Pattern class, but is not realized until now.

A good example is, to find some area on the screen, by giving some frame elements and other things with some logic.

The basic problem with Sikuli's current implementation: You can do all this right now using if/elif/else together with exists(,0), but all the find operations are done sequentially. So with some average pattern (3 - 5 elements) with optimal conditions (average 0.3 to 0.5 seconds per search) you will end up with more than 2 seconds. In the average not optimized situation this might be 5 to 10 seconds. This is not really motivating to use this as a normal approach.

The above mentioned Pattern approach needs real parallel processing (efficiently using multi-processor-environments). This is what is really missing in Sikuli.

something like:

pat = PatternParallel("%1 and %2 or %3 and %4 and not %5", img1, img2, img3, img4, img5)
match = find(pat)

Where match will be the largest region around these elements.

With some effort you could simulate this even now, using some Region.observe(FOREVER, background = True), since these observes are threaded, which gives some parallel effect. The challenge is the coordination of the handlers in this case.

see: https://answers.launchpad.net/sikuli/+question/183655

Revision history for this message
Josh (sammysnake) said :
#2

Ah interesting.

I have already implemented the basics of something like this using a naming convention in my images, where it allows you to create a sequence / series of images that it has to match to successfully match a region. I am finding more complex situations where it would be helpful to have such things like positive/negative lookarounds so my current solution is not complete.

Additionally for scripts to be more powerful it would be helpful to know how the pattern was matched (instead of just returning a region) to deal with a certain representation of a visual element.

I'll try and flesh out this idea a bit more and present my implementation when I'm done or update my progress.

The linked question was started by me also :)
I
RaiMan thanks for your responses.. You're a great resource for this project.

Revision history for this message
Josh (sammysnake) said :
#3

Another design pattern that I am using that I find really useful is storing matching logic in the PNG files themselves in an extra data block.

That way if there is an image that you want to be matched with a higher similarity you can encode that into the image itself and have your "sikuli script" less cluttered. Additionally this logic can be shared by multiple scripts that use the same image without adding additional code to each.

Other things you can do is if you are matching a sequence of images, say for instance you are looking for 3 corners of an application window. This is how something this could play out:

Region1-Image1 (Top-left corner eg. application icon)
Context-Previous: None - Applied to the previous region = null
Context-Current: Similar(1.0) - Applied to the pattern that is used for find()
Context-Next: RegionRight() - Applied to the region that is returned for image2

...returns region on match for image2..

Region1-Image2 (Top-right corner eg. maximize, minimize, close)
Context-Previous: RegionNearby(10) - Applied to image1 region (give some lead-way for this match)
Context-Current: None - Apply to this pattern match
Context-Next: RegonBelow() - Apply to region passed to image3

Region1-Image3 (Bottom-right corner eg, bottom border)
Context-Previous: RegionNearby(10)
Context-Current: Similar(1.0)
Context-Current: None

-- After all 3 corners are matched find the min/max x/y coordinates to find the window size. This allows you to find resizeable windows more robustly that doing a large image match.

Revision history for this message
RaiMan (raimund-hocke) said :
#4

Interesting approach:

This would allow, to store "multi-part" patterns as images.

How do you manage to store this "context" into a .png and how do you read it in your script?

Revision history for this message
Josh (sammysnake) said :
#5

Using a library called PyPNG: http://pypng.googlecode.com

 I create an array with instances of the transforms then pickle the array and store the text string in the tEXt section.

My transformation class and related classes: http://pastebin.com/wjF6p3fh
Adds tEXt section to a PNG image: http://pastebin.com/svKhJAWV

Revision history for this message
RaiMan (raimund-hocke) said :
#6

Thanks for the information.

I had a deeper look at it and made some tests.

I am sure, it does for you what you think it should do.

But as a general solution I have some concerns:

--- performance
each image has to be read twice (once for the context information and then for Sikui's find operation).
I think the information you are storing in the tEXt chunk could be encoded in the image file name as well.

--- extra support needed
To encode the information into the png-file, you need an extra support tool (might be integrated into IDE's Preview). If the information was encoded in the filename, you could do the image management directly from outside in the file system. Some simple naming conventions would be sufficient.

--- some double work
The fact, that some images should be processed in a specific order in some common context is more a workflow aspect, than an image attribute. You encode it with the images, from where you have to translate it back to a (ok: standardized) workflow. So why not write down the workflow directly (ok: should be supported by some functions/classes)

--- not on the Java level
since the kernel API of Sikuli is based on Java, your solution is only (in the sense of easy to use) available in the Jython layer. But as a general Sikuli feature it should be implemented on the Java level.

--- my conclusion
I still think, that a multi-part-Pattern class supported by some image file name conventions, is the appropriate design pattern for this feature.

Revision history for this message
RaiMan (raimund-hocke) said :
#7

I have made it a request bug.

Revision history for this message
Josh (sammysnake) said :
#8

I will elaborate some on some of my design decisions there may be some confusion:

> The fact, that some images should be processed in a specific order in some common context is more a workflow aspect,
> than an image attribute. You encode it with the images, from where you have to translate it back to a (ok: standardized)
> workflow. So why not write down the workflow directly (ok: should be supported by some functions/classes)

I think we may have come to the same conclusion and there may be some confusion behind what details I am storing inside the PNG and what the naming convention is used for.

--- Naming Convention

I wanted the image naming to have some "magic" where the image name itself would designate if it's a single image or if there is other parts to be subsequently matched.

There are a few different situations that can happen:
-- Single
resourceName.png

-- Series - AND - All images must be matched to create a single region
resourceName-0.png
resourceName-1.png
...

-- Single image sequence - OR - Try and find resource 0 OR resource 1
resourseName[0].png
resourceName[1].png

Sequence-Series - OR + AND - Try and find all images in the series, if one fails try the next sequence
resourceName[0]-0.png
resourceName[0]-1.png
resourceName[0]-2.png - If this one doesn't match, proceed to next series

resourceName[1]-0.png
resourceName[1]-1.png

--- PNG attributes

There are three different contexts:
1) previous - Perform operations on Region context returned by previous match
2) current - Perform operations on Pattern of current find operation
3) next - Perform operations on the Region context for the next find operation

--- Image Seq/Series + PNG attributes

Combing these two ideas gives us the power to create a Region context for which the next image is matched instead of searching the entire screen. Say for example image-0.png is matched successfully, then we contain a .below() region transform in image-0.png. When image-1.png is matched it will use the transformed region from image-0.png as the context in which it should be matched. (Eg. Matching the edges of a window)

--

I don't think it is possible to store region transform details as a naming convention, for my own solution I am keeping the naming dead simple so it is easy to identify resources from directory listings.

--

Here is the code for my imageRegion class which performs all the operations dicussed:
http://pastebin.com/VwABPs1C

Revision history for this message
Josh (sammysnake) said :
#9

I started this thread talking about regex because I had a bit of an epiphany that day.. Everything I was doing so far in my implementation was actually an mirror of what regex does for matching text based patterns.

What I have implemented is powerful and works pretty damn good but I was thinking this is probably not the complete solution. Creating a image based expression language as a mirror of regular expressions might be closer to the complete solution set.

Revision history for this message
RaiMan (raimund-hocke) said :
#10

Thanks for the clarification (file name convention vs. information stored in png file).

I personally like your ideas and there surely will be the one or the other, who wants to use it.
That's, why I made it a feature request bug.

But still I doubt, that it is the preferred way to implement this as a general solution for Sikuli (see my concerns above).

*** encoding processing information in image filename - I think it is possible rather easy

--- grouping/sequencing (and/or)
This is what you already have (I would not use any special characters besides hyphen and underscore).

-- screen position of an image in the group
Thinking from the resulting final region you have 2 alternatives:
-1: relative right, left, above, below to another image in the sequence (rN, lN, aN, bN)
N is the number of the reference element, you can have as many of these attributes as needed (e.g. for image 3: r1a1b2 = right-above of image 1 and below image 2)
-2: absolute: max once per group: top-left, top-right, bottom-left, bottom-right (tl, tr, bl, br)
more than once: top-middle, left-middle, right-middle, bottom-middle (tm, lm, rm, bm)
the search direction for the middle elements depend on the previous corner element

A group can either use -1 (scattered pattern) or -2 (frame-like pattern)

-- resulting filenames

relative:
resourceName__01-01.png
resourceName__01-01__a1.png
resourceName__01-03__r1a1b2.png

absolute:
resourceName__01-01__tl.png
resourceName__01-02__tm.png
resourceName__01-03__tm.png
resourceName__01-04__br.png

I think that these images should be searched generally with a min similarity of 0.9 or even 0.95, so there is no need to encode this with the images.

The resulting region is the smallest region, that contains all match regions of the group elements. It should remember the matches it is based on and the MultiPattern.

I would wrap this in a class MultiPattern() with a basic usage of p = MultiPattern(resourceName), which could be used in find operations just like the existing Pattern class.

Revision history for this message
RaiMan (raimund-hocke) said :
#11

more aspects:

-- challenge: performance:
Even taking into account, that the subsequent searches in a group can be done in restricted regions, the total time will add up to some seconds. This might not be acceptable in many situations.

-- challenge: how to restrict the search area
example: If you are searching something right of a given match/region, the search area extends to the screen border and has the height of the given match. If the searched image does not fit into the region, it will not be found. So this is a challenge for the capture process. Or you have to add some extra margins to the search areas, to be "capture fault tolerant".

-- challenge: get the correct next match
If there is more than one match in the restricted area for the next element: which one to take? To have a chance to make a decision (e.g. the nearest one), you have to use findAll() (which extends the search time further). A simple find does not guarantee to return the expected match.

-- challenge: speed things up
The only approach to speed things up beyond restricting the search areas is to search the pattern elements in parallel. But then you have to search all elements on the whole screen or add some additional hints, to restrict the search areas, since you do not have any match information yet.

I will now make some tests in these challenge areas.

Revision history for this message
Josh (sammysnake) said :
#12

>> -- challenge: how to restrict the search area
>> Or you have to add some extra margins to the search areas, to be "capture fault tolerant".

For sequenced images, in the context-previous, I was using a region.nearby() and this seemed to be working pretty good. (Can be seen referenced in the png patcher script above)

--challenge: pre/post transforms stored in the image name
To use the image path searching functions already natively in sikuli require you request the image file name absolutely. There is problems if the image resource name has non-predictable parts to it, eg. resourceName__01-01__"tm".png

In my implementation of searching for sequence/series it will assume that a file exists, and if it gets a file not found error that is the end of that sequence.

--challenge: chaining region transforms
To really be able to precisely indicate where the region where the next match will occur, it is useful to be able to chain region transforms together. Additionally being able to specify a limit on how far the region extends to as an argument to the transform functions.

eg: region.Below(20).Right(30).Nearby(30)

A little bit off topic but I am finding it useful with multi-part matching. Normally in sikuli the arguments are (x, y, w, h), instead of (x1,y1,x2,y2). When doing transforms on regions it is helpful to abstract regions transforms to the later so it is possible to do a delta transform. This transform takes arguments (dx1,dy1,dx2,dy2), so if you want to shift the region to the right 10 spaces and reduce the width by 20.. that would look like region.delta(10, 0, 0, -20)

Revision history for this message
Yanan (baoji58) said :
#13

What's the benefits integrating Sikuli & Selenium?

Thanks,
Yanan

On Wed, Jan 18, 2012 at 3:20 PM, Josh
<email address hidden>wrote:

> Question #184410 on Sikuli changed:
> https://answers.launchpad.net/sikuli/+question/184410
>
> Josh posted a new comment:
> >> -- challenge: how to restrict the search area
> >> Or you have to add some extra margins to the search areas, to be
> "capture fault tolerant".
>
> For sequenced images, in the context-previous, I was using a
> region.nearby() and this seemed to be working pretty good. (Can be seen
> referenced in the png patcher script above)
>
> --challenge: pre/post transforms stored in the image name
> To use the image path searching functions already natively in sikuli
> require you request the image file name absolutely. There is problems if
> the image resource name has non-predictable parts to it, eg.
> resourceName__01-01__"tm".png
>
> In my implementation of searching for sequence/series it will assume
> that a file exists, and if it gets a file not found error that is the
> end of that sequence.
>
> --challenge: chaining region transforms
> To really be able to precisely indicate where the region where the next
> match will occur, it is useful to be able to chain region transforms
> together. Additionally being able to specify a limit on how far the region
> extends to as an argument to the transform functions.
>
> eg: region.Below(20).Right(30).Nearby(30)
>
> A little bit off topic but I am finding it useful with multi-part
> matching. Normally in sikuli the arguments are (x, y, w, h), instead of
> (x1,y1,x2,y2). When doing transforms on regions it is helpful to
> abstract regions transforms to the later so it is possible to do a delta
> transform. This transform takes arguments (dx1,dy1,dx2,dy2), so if you
> want to shift the region to the right 10 spaces and reduce the width by
> 20.. that would look like region.delta(10, 0, 0, -20)
>
> --
> You received this question notification because you are an answer
> contact for Sikuli.
>

Revision history for this message
RaiMan (raimund-hocke) said :
#14

@ Yanan

strange place for such a question ;-)

Sikuli: only what you can see on the screen can be searched or acted on. No access to the GUI elements in the way the underlying application has it.

Selenium: you have access to the DOM structure of a webpage and as such to the GUI elements in the same way you can do it with other DOM aware systems (like javascript).

So if you combine both systems (e.g. on the Java level) you get a mighty test and automation environment for web applications.

Revision history for this message
RaiMan (raimund-hocke) said :
#15

@Josh

--- challenge: pre/post transforms stored in the image name
I think, that file handling can be solved rather easy (no need to specify absolute path names):
-- all images in a group have to be in the same folder (which might be on Sikuli's image path and even in the net)
-- the root element of a group has to have the fixed structure:
resourceName__01-01
-- from this file you get the absolute path and with this the containing folder
-- both in Python and Java it is possible to read the directory and find the rest of the images belonging to the group

So when creating a multi-part pattern ( e.g. p = MultiPattern(resourceName) ) no absolute path is needed and all the file handling is inside the class MultiPattern.

--- so it is possible to do a delta transform.
Yes, many of these possible region transform functions are still missing in Sikuli.
Besides the ones you mentioned e.g. something like reg.belowIncluding().rightIncluding(), which now has to be built with:

reg.above(1).below().left(1).right()

works, but looks ugly.

--- Additionally being able to specify a limit on how far the region extends
I made some performance tests on this and found out, that there is not really a significant difference between specifying below() and below(100). For multi-part patterns you might support optimized searches by a specific element sequence (e.g. tl -> br and then all the other elements).

Revision history for this message
Josh (sammysnake) said :
#16

>>--- challenge: pre/post transforms stored in the image name
Think you're on to something there, this could definitely work. Still not convinced that this would be my preferred method though, I think it will yield some pretty crazy filenames. I can see how putting hidden information in a PNG would not be the most user-friendly solution.

In terms of readability, maintenance I picked the pickle/png tEXt storage route. Additionally down the road maybe a GUI manager for modifying images like this.

>> --- Additionally being able to specify a limit on how far the region extends
The main reason why I want to limiting size is not so much for speed but so that the region of the search is really specific to eliminate false positives.

One method I was using to locate a window I'll coin "anchoring". The problem: sometimes titlebars on a window is fairly generic and you can have multiple dialogs with the same titlebar created by the same program. What I would do is match a fairly large region inside the window, "the anchor". Then from there I know that the titlebar is so many pixels up, left so I limit the region to only allow for this size, this will allow me to match top-left, then I can do a right() to find top-right, etc.

Revision history for this message
RaiMan (raimund-hocke) said :
#17

>> --- Additionally being able to specify a limit on how far the region extends
understood and accepted.

But if you use findAll() internally you do not need a limit value from the user for the restricted region, because you can select the nearest match as the most probable result among those with a high similarity.

This was my idea with the to versions of specifying a pattern.

Revision history for this message
Josh (sammysnake) said :
#18

I already posted a dedicated thread for my framework but I thought I would also shamelessly put a plug in here also.

Those interested in the ideas discussed in this thread can see my implementation of it here:
https://github.com/smysnk/sikuli-framework