[research] Sikuli over WebDriver API / JSONWireProtocol?

Asked by daluu

Has anyone thought about or be interested in using/working with Sikuli via a WebDriver API (e.g. the JSONWireProtocol)?

I don't think such a project exists yet, and I had plans to work on a proof of concept, but never got around to it yet.

Basically, we would wrap the Sikuli (Java) API around the WebDriver API such that any WebDriver language binding can invoke Sikuli.

In general we can map the click and sendKeys() methods and whatever else is applicable, and we would only support a subset of the WebDriver API.

Find elements would be based on captured saved (PNG) images for Sikuli to use and be based on the following WebDriver-based location strategies:

By name - name of PNG image to find, stored in some default location we define globally

By ID - optional to decide how to implement, could also be alias to by name

By XPath - absolute (or relative) path to image in filesystem (probably no validation of the "XPath", we just check that the filepath exists. In essence, we override XPath to mean path to file system rather than really XPath

By CSS selector (or any other unused strategy) - override it as a way to send base64 encoded representation of a PNG image. For use in remote WebDriver / Grid type test deployment where you can send a file locally (read bytes into memory and encode as base64) to the server (which then decodes the base64, saves to temp location, and uses it with Sikuli to locate the "element"). In essence, this would be similar to WebDriver's way of handling file uploads (via sendKeys) and FirefoxProfiles for remote WebDriver/Grid deployments where the file/profile need not reside on the actual node but can be from the machine sending the WebDriver commands.

I think this type of solution offers a good way to integrate Sikuli with Selenium/WebDriver/Appium/ios-driver tools for a system integration type test or where those tools are lacking that Sikuli complements. Something more powerful and flexible than (Firefox) SikuliWebDriver (https://code.google.com/p/sikuli-api/wiki/SikuliWebDriver) and which would work over local/remote/grid deployments.

This would be in essence, building a WebDriver server interface to Sikuli so that WebDriver clients can call Sikuli as if it was a WebDriver with minimal code changes on the client side to support.

Let me know what you think of this.

Question information

Language:
English Edit question
Status:
Answered
For:
SikuliX Edit question
Assignee:
RaiMan Edit question
Last query:
Last reply:
Revision history for this message
daluu (cuuld) said :
#1

This would be similar in ways to

https://github.com/enix12enix/sikuli-remote-control

https://github.com/enix12enix/sikulirc

except it's more language agnostic reusing the WebDriver API rather than defining a similar API, and thus can be used with any WebDriver language binding.

As a 3rd part project this would be cool. But might also be cooler if integrated as part of the Sikuli mainline project.

Revision history for this message
RaiMan (raimund-hocke) said :
#2

sounds very interesting.

... especially

... the "alternative ways" to specify image "names"
... to send image files over the net using base64 encoding
... and generally solutions for remote Sikuli command execution

Could you just make a short example for a Sikuli aware Webdriver workflow and/or just list the Wbdriver "commands" that would make sense in having a Sikuli interface?

Revision history for this message
daluu (cuuld) said :
#3

What do you mean by workflow specifically?

In a nutshell we'd start up a Sikuli-aware Selenium server (that drives Sikuli, not browsers/Selenium), it could be a JAR file, a binary, or however it is implemented and takes optional command line arguments like default image repository location.

This server listens for WebDriver commands (over HTTP via the JSONWireProtocol), decodes the command and maps it to the appropriate Sikuli command and executes the Sikuli command, gets the Sikuli return value (if any) and maps it back to WebDriver command response to send back to the WebDriver client. Any exceptions are propagated back to the WebDriver client, perhaps preprocessed to clean up the error messaging to make more sense WebDriver style.

Starting up the Sikuli Selenium server may instantiate a Sikuli instance as needed to do that automation.

So that's the general workflow in my mind. As for example WebDriver command usage:

//setting default Sikuli timeouts (e.g. for finding images) via WebDriver
driver.manage().timeouts().implicitlyWait(60, TimeUnit.SECONDS);
//there could be other examples, the above is just one, probably the primary one

//not sure whether there is any use or feasibility in "findElements" with Sikuli, but we could support findElement

driver.findElement(By.name("WindowsStartMenu.png")).click(); //finds image from default repository location

driver.findElement(By.xpath("C:\\Test\\RunDialogTextField.png")).sendKeys("notepad");

driver.findElement(By.cssSelector("base64encodedImageStringHere")).click();

I would assume a user of the Sikuli (Java) API could easily infer what the above WebDriver commands should translate to in terms of the Sikuli API equivalent calls. Though for the case of the base64 image, extra processing is needed to convert it to a binary image stored in temp directory before passing to Sikuli command.

There could be more use cases beyond click() and sendKeys() and setting timeouts but I've not used enough of Sikuli nor have I thought it out in detail on what WebDriver APIs to support and what not, but this is a good starting reference for a proof of concept. No?

In terms of remote Sikuli execution, it is already remote when using WebDriver. The user instantiates a RemoteWebDriver client instance which connects to the Sikuli-aware Selenium server which listens for client requests, proxying it to/from Sikuli (locally on the machine that runs the Sikuli-aware Selenium server). In such a way, it can be used remotely or locally (localhost), and in Selenium Grid deployments.

We're basically building the Sikuli-aware Selenium server, replacing the internal Selenium code that manipulates browsers once the WebDriver command is decoded with Sikuli commands instead. This work could be based on modifying the Selenium server codebase or perhaps other WebDriver API based servers like Appium or ios-driver, whichever is easier and better to do.

Revision history for this message
daluu (cuuld) said :
#4

Also in terms of finding images or checking if something (image) exists with Sikuli, in WebDriver that would simply be

WebElement result = driver.findElement(By.name("imageA.png"));

WebElements result = driver.findElements(By.name("imageB.png"));

where Sikuli would return a reference to the found image encapsulated/defined as a WebElement (as defined by WebDriver API / JSONWireProtocol).

For findElement, if not found, we throw exception (pass exception from Sikuli up to WebDriver, massaging messaging as needed).

For findElements, if not found, we return a null/empty list of WebElements. If found, we return the result (e.g. 1 element list). I don't know whether Sikuli can find multiple matches, and if yes, then we return all those references as a list of WebElements. No exception is ever thrown for findElements, just an empty list.

And click(), sendKeys(), basically is a chained call to findElement(s) first then using the found reference invoke the click/sendKeys against that element and get back any return value from it.

Revision history for this message
RaiMan (raimund-hocke) said :
#5

ok, understood.

Not my current priority, but if you do anything in this direction, come back if you have any requests about what Sikuli should do or have, to support this approach.

Revision history for this message
daluu (cuuld) said :
#6

Well, I took a shot at it. But haven't succeeded yet. Will continue to work on it as I have time and when in the mood. The original attempt uses Python running via Jython, with the WebDriver server base code from old Appium code. I just retrofitted it for Sikuli APIs. Can't get it fully running yet.

I have alternate idea to implement in pure Java using ios-driver's server component or the Selenium project's Selenium server component, gutting the internals out replacing with Sikuli Java API calls. But those projects have more files to dig through to gut out & retrofit. Will look into that on a future date.

Anyhow, the Python code that's available now, with accompanying sample demo code of WebDriver client bindings that call this sever to issue Sikuli commands over WebDriver API (also in Python) should suffice as a prototype example of how you would implement Sikuli over WebDriver protocol. The sample demo code would work (somewhat) if the server portion I got working first.

https://github.com/daluu/SikuliDriverServer

see the wiki section of the repo for additional details that are more user centric than looking at code.

and for those curious, if you want to compare against some working implementations of a desktop GUI test tool adapted for WebDriver, see these other servers I got working:

https://github.com/daluu/AutoItDriverServer

https://github.com/daluu/AutoPyDriverServer

AutoPyDriverServer is closer to behavior of Sikuli, though I must say I prefer Sikuli in terms of usablity/accuracy from what I remember when last using Sikuli.

and then there's also https://github.com/appium/appium-for-mac, for Mac UI automation, though that's not a good candidate to look at unless you like XCode and Objective-C.

So, to end this, I'd also welcome any community contributions to improve the current code that I have for SikuliDriverServer that I haven't gotten working yet. ;-)

Revision history for this message
RaiMan (raimund-hocke) said :
#7

ok - thanks for the update.

Now watching your SikuliDriverServer.

Revision history for this message
daluu (cuuld) said :
#8

Wanted to post an update related to this: while not exactly of the same scope, there is no Sikuli-ish functionality for mobile testing with Appium, at least with respect to finding elements by image. Thanks to a suggestion I made long time back, it recently got implemented.

https://appiumpro.com/editions/32

Can you help with this problem?

Provide an answer of your own, or ask daluu for more information if necessary.

To post a message you must log in.