Question #234686 “[research] Sukuli easy interaction with any so...” : Questions : SikuliX

Revision history for this message

Roman Podolyan (podolyan-roman) said on 2013-08-27:

#1

Well, why don't you try the thing you say:
- read text file with some strings (images names)
- try to search for them
- and write reactions

If success, you have to look for the 3d party tools able to monitor some folders/files for changes and run script if necessary.

Revision history for this message

Fidel (fidelperez) said on 2013-08-28:

#2

Ok, I will, and post the results.
Yes, I think pretty much any programming language can read and write txt files, and watch for changes with a loop.

Revision history for this message

RaiMan (raimund-hocke) said on 2013-08-30:

#3

I do not really understand the approach:
- a Sikuli script is a text file
- and it contains the image file names

So why an additional text file?

Only having a bunch of image files and check wether they can be found or not is not a real problem, because you always need the information/setup of the screen (the visual context), to have a chance to find the image.

This is in the end, what makes up a Sikuli script: a workflow simulating user interactions, that drive the application state from one point to the next and using images on the screen to act on or to check what is the status.

There are 2 things, that are currently missing with Sikuli:
- a recorder, that intelligently observes user actions and produces a runnable workflow as Sikuli script
- a top level DSL (domain specific language), to define some visual workflow including decisions and repeats

Revision history for this message

Fidel (fidelperez) said on 2013-08-30:

#4

Hi RaiMan:

It seems *really* interesting to me that you mentioned "a recorder, that intelligently observes user actions and produces a runnable workflow as Sikuli script", since my objective is something close to that. You said you don't understand the approach but you guessed where my work is pointing to, I was very impressed by that.

Being *very* cryptic, what I am trying to do through this approach, is a real-time interaction Sikuli - Python Script.
I am using an outliner software (Leo Editor) to dynamically generate Python scripts, which are also run dynamically.
So, at a certain point, the Python script will be requiring to find "Image A and image B", because it needs it to do some other thing.
A few seconds later, and not because any Sikuli reason, it might need to find "Image C and D", and the first two are not necessary any more.
Instead of needing to create two Sikuli scripts and then run them, the cleanest way I could imagine for this sort of dynamic-real-time interaction was to feed txt files with the image paths and info required at that moment to a Sikuli Script, and extract the required data also through datafiles.
This way the information goes systematically in and out, and that is the best way to allow it to programatically interact with dynamic scripts: A structured scalable input-output system.
This approach would avoid to have to reload Sikuli every time the image requisites change or are updated.

Ideally, the required information of the screen setup would be picked up by the Sikuli script on its startup. Maybe Im not understanding right

This would actually be like using Sikuli as the "eyes" of a multi-level structure (The outliner i mentioned).
I want to add Sikuli to the outliner-dynamic-programming scheme Im creating since then both will benefit from each other:
Sikuli will have the advantages that my bigger project can bring (such as dinamically and interactive script creation), and my project will have the enhanced powers that Sikuli will bring to it.
Im still begining with this project, but Im starting to make some interesting things with it. I intend to release it open source as soon as I have something worth to show.

On a side note ( not so much related to the post, but might give some more information), this is also the way I conceive a robot would work with images. At a certain point it would be searching for specific images. It will try to solve the problem it has atm by other means (another sensors, sound, tactile, etc) and when it solves that problem it will go to the next. Instead of needing new scripts to be compiled on itself real time, if it only needs to specify the "image paths" its searching for, there are a lot less of resources required.
Well I dont know much about robotics so this is just a personal view. I let those thoughts guide me towards my interactive-programming system tho.

Thank you RaiMan for your answer, and please forgive the possible misconceptions on the way Sikuli works or the way Im doing this, Im working hard to understand both better everyday.

Hi RaiMan:

It seems *really* interesting to me that you mentioned "a recorder, that intelligently observes user actions and produces a runnable workflow as Sikuli script", since my objective is something close to that. You said you don't understand the approach but you guessed where my work is pointing to, I was very impressed by that.

Being *very* cryptic, what I am trying to do through this approach, is a real-time interaction Sikuli - Python Script. 
I am using an outliner software (Leo Editor) to dynamically generate Python scripts, which are also run dynamically.
So, at a certain point, the Python script will be requiring to find "Image A and image B", because it needs it to do some other thing.
A few seconds later, and not because any Sikuli reason, it might need to find "Image C and D", and the first two are not necessary any more.
Instead of needing to create two Sikuli scripts and then run them, the cleanest way I could imagine for this sort of dynamic-real-time interaction was to feed txt files with the image paths and info required at that moment to a Sikuli Script, and extract the required data also through datafiles.
This way the information goes systematically in and out, and that is the best way to allow it to programatically interact with dynamic scripts: A structured scalable input-output system.
This approach would avoid to have to reload Sikuli every time the image requisites change or are updated.

Ideally, the required information of the screen setup would be picked up by the Sikuli script on its startup. Maybe Im not understanding right

This would actually be like using Sikuli as the "eyes" of a multi-level structure (The outliner i mentioned).
I want to add Sikuli  to the outliner-dynamic-programming scheme Im creating since then both will benefit  from each other:
Sikuli will have the advantages that my bigger project can bring (such as dinamically and interactive script creation), and my project will have the enhanced powers that Sikuli will bring to it.
Im still begining with this project, but Im starting to make some interesting things with it. I intend to release it open source as soon as I have something worth to show.

On a side note ( not so much related to the post, but might give some more information), this is also the way I conceive a robot would work with images. At a certain point it would be searching for specific images. It will try to solve the problem it has atm by other means (another sensors, sound, tactile, etc) and when it solves that problem it will go to the next. Instead of needing new scripts to be compiled on itself real time, if it only needs to specify the "image paths" its searching for, there are a lot less of resources required.
Well I dont know much about robotics so this is just a personal view. I let those thoughts guide me towards my interactive-programming system tho.

Thank you RaiMan for your answer,  and please forgive the possible misconceptions on the way Sikuli works or the way Im doing this, Im working hard to understand both better everyday.

Revision history for this message

Fidel (fidelperez) said on 2013-08-31:

#5

Thanks RaiMan, that solved my question.

Revision history for this message

RaiMan (raimund-hocke) said on 2013-09-01:

#6

Since I do not think, that your question is solved, the following comments:

--- Leo Editor
... is some kind of approach, that allows to "outline" a workflow (program flow) in a way, people are used to when working with larger structured documents: the details are hidden behind a top level structure, that only consists of headlines. In the case of Leo, the details are dynamically created Python script snippets. Leo itself is scriptable using Python, so you would be able to add any needed/wanted additional or modifying features.

--- following a list of images ...
... would be some kind of "robot", that tries to find his way through some kind of visual world looking for the next image match around it, moving to that point and then looking for the next image.
But talking in Sikuli, the robot would do a bit more: each image has some additional aspect: an action to be performed, if the robot "sees" the image.
example: the next image might be that of a door and the related actions might be one or more out of: open, close, knock, go through, wait until it opens, crash it to be open for ever, ....
This exactly is what you can do with Sikuli: wait for some image to appear and then click this or something else, to get things moving to the next visual state.
So in the end each Sikuli script is a workflow (or outline speaking in Leo, where the details are not hidden).

--- using Sikuli features with Leo ...
... will not work out of the box, since Sikuli scripts are written in Python language, but the interpreter used is Jython (a Java based implementation of a Python interpreter at language level 2.5). So you will have the same problems using Sikuli in this context, as when trying to use Sikuli with Python: it is not possible, to directly use the Sikuli features (tightly coupled via API calls).
Possible solutions are all loosely coupled based on some inter process communication (batch scripts, running scripts in subprocesses, using some kind of RPC, ...).

*** Conclusion: combining Sikuli with Leo does not make any sense, since it moves the complexity to a higher level, than needed. Your ideas can simply be implemented, by just using the Sikuli IDE. It is up to you, to use some outline approach, when creating your scripts (which in fact would be some top down approach).

Since I do not think, that your question is solved, the following comments:

--- Leo Editor
... is some kind of approach, that allows to "outline" a workflow (program flow) in a way, people are used to when working with larger structured documents: the details are hidden behind a top level structure, that only consists of headlines. In the case of Leo, the details are dynamically created Python script snippets. Leo itself is scriptable using Python, so you would be able to add any needed/wanted additional or modifying features.

--- following a list of images ...
... would be some kind of "robot", that tries to find his way through some kind of visual world looking for the next image match around it, moving to that point and then looking for the next image.
But talking in Sikuli, the robot would do a bit more: each image has some additional aspect: an action to be performed, if the robot "sees" the image.
example: the next image might be that of a door and the related actions might be one or more out of: open, close, knock, go through, wait until it opens, crash it to be open for ever, ....
This exactly is what you can do with Sikuli: wait for some image to appear and then click this or something else, to get things moving to the next visual state.
So in the end each Sikuli script is a workflow (or outline speaking in Leo, where the details are not hidden).

--- using Sikuli features with Leo ...
... will not work out of the box, since Sikuli scripts are written in Python language, but the interpreter used is Jython (a Java based implementation of a Python interpreter at language level 2.5). So you will have the same problems using Sikuli in this context, as when trying to use Sikuli with Python: it is not possible, to directly use the Sikuli features (tightly coupled via API calls).
Possible solutions are all loosely coupled based on some inter process communication (batch scripts, running scripts in subprocesses, using some kind of RPC, ...).

*** Conclusion: combining Sikuli with Leo does not make any sense, since it moves the complexity to a higher level, than needed. Your ideas can simply be implemented, by just using the Sikuli IDE. It is up to you, to use some outline approach, when creating your scripts (which in fact would be some top down approach).

Revision history for this message

Fidel (fidelperez) said on 2013-09-01:

#7

First of all, RaiMan, I really appreciate the time you are taking to answer me. Its being very valuable to me, and I think im learning a lot through studying your comments.

Second, here is a bit of explanation on why I don't completely agree (yet) with your conclusion.
The fact is that Leo can be used as a very powerful personal information manager, and data manager, because the nodes can also be links to websites, files, etc.
It can also move the information for you (so if you move one branch of the outline to another place, the files will also move if you want to) etc.

Given that background, one of my plans is to help that PIM function with visual functions.

A small example of what I want to do:

Say I want to keep track of how much time I spend playing, and how much working, and I want this to be done automatically. So:

- Whenever Sikuli detects a web browser is open, it will copy the website link and communicate it to LeoEditor.
- Whenever Sikuli detects the Starcraft II Icon (Image) it will activate play mode.
- Whenever Sikuli detects LeoEditor (Image) it will activate work mode.

One of the branches of my outline in Leo editor has the websites links, and some of them have tags of "Work mode" and others have "Play mode" tags.
So when it receives the information from Sikuli that XXX website was open, it searches for the link within my stored ones, and activates the according mode.

In another branch, I keep the icons of the games I play, so when Sikuli is searching for game icons, it knows those are the images it has to search for.

This way, If I share this program, (and considering the new user also has tags for "work mode" and "Play mode" on his Leo weblinks), he will only have to go to the "Game icons" branch and replace the images for the games he plays.
He wont have to edit the Sikuli script, and manually add/delete part of the script, increasing or decreasing the number of icons, etc, he just has to change the nodes under the "Game icons" node.
So, Ideally, someone who doesnt know programming would be able to change it, without messing with the script.

This is the first thing I can come up with, but I think with similar pattern there are many applications in which this would be similar. Basically, we have the possibility to categorize the script in a way that, by the user just replacing the images under a node, he has personalized a (potentially) really complex script.

Therefore, I still think the mix of Sikuli and Leo is an interesting one, Sikuli powerful image tracking, plus LeoEditor data managing, since I dont need to have a specific file with the "work mode" and "play mode" links just for this script, but it gathers the data from my personal information which is usually changing, and with those changes the script is automatically updated through this approach, no need to edit the script again whenver I add a new link.
This is just an example, but it would work with all kind of data, so in the end the quickest way is for Sikuli to interact directly with it instead of making a script out of every interaction I come up with, storing the specific data for that interaction, and having to constantly update it.

When I started, I was planning on doing this with AHK, since I only use windows, its a bit more mature than Sikuli, but Sikuli's far better image recognition capabilities plus its multy-platform property made me focus towards it even though it looks a bit more complex to do.
For instance, with AHK I can just tell the script to send text directly to LeoEditor, even keeping it as an inactive window, and thats it, I got the info in LeoEditor as soon as AHK percieves it. So the quick easy way would be that one.
But then again, Sikuli looks so much more powerful and stronger bet in the long term to me.

Thanks again RaiMan for your comments, I hope that this makes a bit more of sense than my previous posts.