Copy a text after a search

Asked by charliedaps

Hello everyone,

I have to do several searches in codes and texts.

I start the search function of the software (word for a part and a browser for another part of the project) I paste the text I'm looking for.

Until then everything is fine, the search works but then I do not know how to do it because I have to select and copy part of the rest of the text (so I do not know exactly the words).

So in a few words.

The search finds me:

XXXXX

I would like Sikuli to select the rest of the sentence (so until the punctualition sign)

Then

Copy and like in excel (that I know how to do;))

Do you have an idea how to select some of the following from a search?

Thank you in advance for your help.

Have a good day

Question information

Language:
English Edit question
Status:
Answered
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Mike (maestro+++) said :
#1

If you know how to do it in Excel, then why not Cntl-C the contents of the field in which found the search string and Cntl-V the text into Excel?. You can then decide how much of the field text you want to keep using your normal algorithms. Then copy-and-paste again into the target appliation.

Revision history for this message
charliedaps (charliedaps) said :
#2

Hello,

Thank you for your answer and help.

In fact, I do not know how to make python or sikuli select some of the text.

I do a search "image" that finds in the code of a page

So "image" will be found but I want it to be the continuation of the code that is after the word "image" that is copied.

How can I tell sikuli to take the characters after the word "image" and stop at the end of the sentence?

Thank you

Revision history for this message
Mike (maestro+++) said :
#3

I guess what you are saying is you find XXXXX but it is not in a field that you can copy and paste. So you want to track forwards and backwards to pick up everything that is text from the preceding <start of sentence> to the following <end of sentence> . You may have to cope with carriage returns ie the text search has to move up/down and across the page. SikuliX ( as distinct from the underlying Tesseract OCR engine) only picks up text from a provided region ie you have to know the region before you can get the text. Are you able to guess the region surrounding the XXXXX that you have found?

Revision history for this message
charliedaps (charliedaps) said :
#4

Hello,

Thank you for your answer.

Since there are two types of text, I will start with the first, it will be easier.

I have to search in html code.

The only thing I can do is display the code.

I make a ctrl F with the word "img"

The browser will put the word "img" is orange, so I can predict a find or exists with the image of the word "img".

Like this

paste("view-source:http://www.xxx.xx)
type(Key.ENTER)
wait(2)
type("f", KeyModifier.CTRL)
paste("/img/")
find("1543319910796.png")
click("1543319910796.png")

But after how to select the rest of the sentence?

Revision history for this message
RaiMan (raimund-hocke) said :
#5

Not really a job for SikuliX's visual features.

When working with web-content, then Selenium should be the first choice, since it is able to "look into" the web elements. It is often combined with SikuliX.

But in your case it is much simpler: the Chrome extension "view-source" downloads the page-content as .htm file anyways (should be downloaded to the standard download folder as view-source_xxx.xx.htm).

This file can simply be read by the script into a string, where you have tons of standard Python String functions to search through the text.

... and there are packages/libraries available, that can break/parse the HTML structure.

Revision history for this message
charliedaps (charliedaps) said :
#6

Thank you for your answer.

I had not thought to download the code page but actually it is already what I did for another project: the search in the code page.

Can you recommend libraries that do what I'm looking for?

I am really beginner and it seems very complicated when I look on the sites of help.

Thank you

Revision history for this message
RaiMan (raimund-hocke) said :
#7

Again: if it is only about extracting text from a web-page, then the SikuliX environment is surely not the right tool, since the Python scripting is done by the Java based Jython interpreter, which is far behind the actual Python interpreter.

I checked some XML/HTML-parsers available for Python: most of them are not useable with Jython and the useable one did not work.

So I recommend to start with the solution, that downloads the page-content to a file, reads that file and the extracts the text by using the standard string features.

I am sorry, but I do not have the time to write your scripts.

Revision history for this message
charliedaps (charliedaps) said :
#8

Ok thanks for your explanations and details.

I will inquire about the standard string features.

thank you so much

Can you help with this problem?

Provide an answer of your own, or ask charliedaps for more information if necessary.

To post a message you must log in.