[research] Sukuli easy interaction with any software/OS

Asked by Fidel

Hi:
I was wondering if there is any script that would do the following, or if this would be convenient at all, since Im considering this solution for my purposes. I should warn you I selft-taught me programming so maybe I lack of some basic understandings.

Script objective: Simple way to interact with anything. For this example Im going to say "python" but its my understanding that it can be any other programming language.

How to:

The Sukuli script has a folder with several TXT files, each of them referring to a different action. For instance:

- List of Images the user is searching, by path. Possible sintaxis to add precision to each image.
- Images that the user was searching and have been found
- Text Sukuli must send asap.
- Etc

So when the user changes the file "Images the user is searching", the sukuli script is refreshed and will look for those.
When the script finds a match, it writes it in "Images found", with details (position, etc)

This way, an external script, for instance in python, can be checking for the "Images found" file, and work with it. It can also update "Images searching" for sukuli to inlcude them in its search.

A txt file can exist for each function sukuli has.

This way, it would be a matter of writting/reading those txt files for any user to be able to interact with their scripts (written in any other language) in a very simple fashion. Just by adding the images he wants to search or removing them from the txt file, the Sukuli script would be self-updating to the current situation the other program needs, in a "order queuing" fashion.

Also, this way computers can be remotelly controlled in a very simple fashion through dropbox, since each time a txt file changes, the script will be executing a new order, and informing back on those txt files.

I understand perhaps this is not something valuable for everybody, so I would appreciate some pointers on how to begin to work on this.

Thank you for Sukuli!

Question information

Language:
English Edit question
Status:
Solved
For:
SikuliX Edit question
Assignee:
RaiMan Edit question
Solved by:
RaiMan
Solved:
Last query:
Last reply:
Revision history for this message
Roman Podolyan (podolyan-roman) said :
#1

Well, why don't you try the thing you say:
- read text file with some strings (images names)
- try to search for them
- and write reactions

If success, you have to look for the 3d party tools able to monitor some folders/files for changes and run script if necessary.

Revision history for this message
Fidel (fidelperez) said :
#2

Ok, I will, and post the results.
Yes, I think pretty much any programming language can read and write txt files, and watch for changes with a loop.

Revision history for this message
Best RaiMan (raimund-hocke) said :
#3

I do not really understand the approach:
- a Sikuli script is a text file
- and it contains the image file names

So why an additional text file?

Only having a bunch of image files and check wether they can be found or not is not a real problem, because you always need the information/setup of the screen (the visual context), to have a chance to find the image.

This is in the end, what makes up a Sikuli script: a workflow simulating user interactions, that drive the application state from one point to the next and using images on the screen to act on or to check what is the status.

There are 2 things, that are currently missing with Sikuli:
- a recorder, that intelligently observes user actions and produces a runnable workflow as Sikuli script
- a top level DSL (domain specific language), to define some visual workflow including decisions and repeats

Revision history for this message
Fidel (fidelperez) said :
#4

Hi RaiMan:

It seems *really* interesting to me that you mentioned "a recorder, that intelligently observes user actions and produces a runnable workflow as Sikuli script", since my objective is something close to that. You said you don't understand the approach but you guessed where my work is pointing to, I was very impressed by that.

Being *very* cryptic, what I am trying to do through this approach, is a real-time interaction Sikuli - Python Script.
I am using an outliner software (Leo Editor) to dynamically generate Python scripts, which are also run dynamically.
So, at a certain point, the Python script will be requiring to find "Image A and image B", because it needs it to do some other thing.
A few seconds later, and not because any Sikuli reason, it might need to find "Image C and D", and the first two are not necessary any more.
Instead of needing to create two Sikuli scripts and then run them, the cleanest way I could imagine for this sort of dynamic-real-time interaction was to feed txt files with the image paths and info required at that moment to a Sikuli Script, and extract the required data also through datafiles.
This way the information goes systematically in and out, and that is the best way to allow it to programatically interact with dynamic scripts: A structured scalable input-output system.
This approach would avoid to have to reload Sikuli every time the image requisites change or are updated.

Ideally, the required information of the screen setup would be picked up by the Sikuli script on its startup. Maybe Im not understanding right

This would actually be like using Sikuli as the "eyes" of a multi-level structure (The outliner i mentioned).
I want to add Sikuli to the outliner-dynamic-programming scheme Im creating since then both will benefit from each other:
Sikuli will have the advantages that my bigger project can bring (such as dinamically and interactive script creation), and my project will have the enhanced powers that Sikuli will bring to it.
Im still begining with this project, but Im starting to make some interesting things with it. I intend to release it open source as soon as I have something worth to show.

On a side note ( not so much related to the post, but might give some more information), this is also the way I conceive a robot would work with images. At a certain point it would be searching for specific images. It will try to solve the problem it has atm by other means (another sensors, sound, tactile, etc) and when it solves that problem it will go to the next. Instead of needing new scripts to be compiled on itself real time, if it only needs to specify the "image paths" its searching for, there are a lot less of resources required.
Well I dont know much about robotics so this is just a personal view. I let those thoughts guide me towards my interactive-programming system tho.

Thank you RaiMan for your answer, and please forgive the possible misconceptions on the way Sikuli works or the way Im doing this, Im working hard to understand both better everyday.

Revision history for this message
Fidel (fidelperez) said :
#5

Thanks RaiMan, that solved my question.

Revision history for this message
RaiMan (raimund-hocke) said :
#6

Since I do not think, that your question is solved, the following comments:

--- Leo Editor
... is some kind of approach, that allows to "outline" a workflow (program flow) in a way, people are used to when working with larger structured documents: the details are hidden behind a top level structure, that only consists of headlines. In the case of Leo, the details are dynamically created Python script snippets. Leo itself is scriptable using Python, so you would be able to add any needed/wanted additional or modifying features.

--- following a list of images ...
... would be some kind of "robot", that tries to find his way through some kind of visual world looking for the next image match around it, moving to that point and then looking for the next image.
But talking in Sikuli, the robot would do a bit more: each image has some additional aspect: an action to be performed, if the robot "sees" the image.
example: the next image might be that of a door and the related actions might be one or more out of: open, close, knock, go through, wait until it opens, crash it to be open for ever, ....
This exactly is what you can do with Sikuli: wait for some image to appear and then click this or something else, to get things moving to the next visual state.
So in the end each Sikuli script is a workflow (or outline speaking in Leo, where the details are not hidden).

--- using Sikuli features with Leo ...
... will not work out of the box, since Sikuli scripts are written in Python language, but the interpreter used is Jython (a Java based implementation of a Python interpreter at language level 2.5). So you will have the same problems using Sikuli in this context, as when trying to use Sikuli with Python: it is not possible, to directly use the Sikuli features (tightly coupled via API calls).
Possible solutions are all loosely coupled based on some inter process communication (batch scripts, running scripts in subprocesses, using some kind of RPC, ...).

*** Conclusion: combining Sikuli with Leo does not make any sense, since it moves the complexity to a higher level, than needed. Your ideas can simply be implemented, by just using the Sikuli IDE. It is up to you, to use some outline approach, when creating your scripts (which in fact would be some top down approach).

Revision history for this message
Fidel (fidelperez) said :
#7

First of all, RaiMan, I really appreciate the time you are taking to answer me. Its being very valuable to me, and I think im learning a lot through studying your comments.

Second, here is a bit of explanation on why I don't completely agree (yet) with your conclusion.
The fact is that Leo can be used as a very powerful personal information manager, and data manager, because the nodes can also be links to websites, files, etc.
It can also move the information for you (so if you move one branch of the outline to another place, the files will also move if you want to) etc.

Given that background, one of my plans is to help that PIM function with visual functions.

A small example of what I want to do:

Say I want to keep track of how much time I spend playing, and how much working, and I want this to be done automatically. So:

- Whenever Sikuli detects a web browser is open, it will copy the website link and communicate it to LeoEditor.
- Whenever Sikuli detects the Starcraft II Icon (Image) it will activate play mode.
- Whenever Sikuli detects LeoEditor (Image) it will activate work mode.

One of the branches of my outline in Leo editor has the websites links, and some of them have tags of "Work mode" and others have "Play mode" tags.
So when it receives the information from Sikuli that XXX website was open, it searches for the link within my stored ones, and activates the according mode.

In another branch, I keep the icons of the games I play, so when Sikuli is searching for game icons, it knows those are the images it has to search for.

This way, If I share this program, (and considering the new user also has tags for "work mode" and "Play mode" on his Leo weblinks), he will only have to go to the "Game icons" branch and replace the images for the games he plays.
He wont have to edit the Sikuli script, and manually add/delete part of the script, increasing or decreasing the number of icons, etc, he just has to change the nodes under the "Game icons" node.
So, Ideally, someone who doesnt know programming would be able to change it, without messing with the script.

This is the first thing I can come up with, but I think with similar pattern there are many applications in which this would be similar. Basically, we have the possibility to categorize the script in a way that, by the user just replacing the images under a node, he has personalized a (potentially) really complex script.

Therefore, I still think the mix of Sikuli and Leo is an interesting one, Sikuli powerful image tracking, plus LeoEditor data managing, since I dont need to have a specific file with the "work mode" and "play mode" links just for this script, but it gathers the data from my personal information which is usually changing, and with those changes the script is automatically updated through this approach, no need to edit the script again whenver I add a new link.
This is just an example, but it would work with all kind of data, so in the end the quickest way is for Sikuli to interact directly with it instead of making a script out of every interaction I come up with, storing the specific data for that interaction, and having to constantly update it.

When I started, I was planning on doing this with AHK, since I only use windows, its a bit more mature than Sikuli, but Sikuli's far better image recognition capabilities plus its multy-platform property made me focus towards it even though it looks a bit more complex to do.
For instance, with AHK I can just tell the script to send text directly to LeoEditor, even keeping it as an inactive window, and thats it, I got the info in LeoEditor as soon as AHK percieves it. So the quick easy way would be that one.
But then again, Sikuli looks so much more powerful and stronger bet in the long term to me.

Thanks again RaiMan for your comments, I hope that this makes a bit more of sense than my previous posts.

Revision history for this message
RaiMan (raimund-hocke) said :
#8

Thanks for kind feedback.
I take time for things I am interested in ;-)

I fully understand your approach and agree.

But again: Leo is only one approach, to take the outline of a workflow to a higher level of abstraction. And it is not the easiest one with respect to the implementation of the Sikuli features.

One of the former developers of Sikuli (Tom Yeh) has made an interesting approach to outline a visual workflow:
http://slides.sikuli.org
One workflow step = one slide of a presentation

What I am after is some generalized language, that allows to write down visual workflows in a more natural language, where some keywords define workflow and the possible actions:

your example for example:

when you see image1, then I am in work mode
when you see image2, then I am in play mode
when I am in work mode, then accumulate the time as work time
when I am in play mode, then do nothing
when I shutdown, show my work time and quit

When you first run this script, you will be asked for the 2 images and a name for the customized workflow.

The generated Sikuli script might look like this:

workMode = False
PlayMode = False
isShutdown = false
workTime = 0
startTime = 0

def startTimer():
    global startTime
    startTime = time.time()

def stopTimer():
    global startTime, workTime
    worktime = worktime + time.time() - startTime

# to get in one of the 2 modes at startup
while not isShutdown:
    if exists(image1):
        workMode = True
        startTimer()
        break
    elif exists(image2):
        playMode = True
        break

# running
while not isShutdown:
    if workMode and exists(image2):
        workMode = False
        stopTimer()
        playMode = True
        continue
    if playMode and exists(image1):
        playMode = False
        workMode = True
        startTimer()
        continue

# shutdown requested
popup("work time = " + str(workTime))

... and this script could be generated in any Sikuli capable scripting language or even as Java class.