find word in some string read from a file - encoding problem

Asked by fernando gandini

hi,

i have a string that i copy from notepad, so i need to "find" some word into this string..i try to use .find(' ') from Pytom but its not work.

if (texto.find('gol')):
    popup("sim")
else:
    popup("não")

anyone can help me please?

Question information

Language:
English Edit question
Status:
Solved
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
RaiMan (raimund-hocke) said :
#1

str.find()
returns the position of the searched string in the given string between 0 and len(str).
if not found, returns -1.

so if str.find(): does not work.

If you only ant to know, that it exists:

if -1 < texto.find('gol'): # True if found

or

if (texto.count('gol'): # 0 if not found, which is False

Revision history for this message
fernando gandini (fernando-gandini) said :
#2

hi tanks for the Answer...but now i have another problem.
I get a string from noteped, and i dont know why the .find() still dont work.
here is the code

xReader = file("C:\\Users\\fgandini\\Desktop\\infos.txt")
for line in xReader:
    line.rstrip("\n")
    infoText.append(line.strip())

if (-1 == infoText[1].find('Física')):
    popup("sim")
    type(Key.TAB)
else:
    type(Key.RIGHT)
    type(Key.TAB)
##########
#text from notepad
Fernando M.D.F Gandini
Física
05439120

tanks

Revision history for this message
fernando gandini (fernando-gandini) said :
#3

ok, now i know the real problem....

i have to compare the string that i get from notepad "Física" with "Física" that i put as string in Sikuli...
but because of enconding sikuli compare "Física" with "FÃ-sica"
how can i solve that?
tanks

Revision history for this message
fernando gandini (fernando-gandini) said :
#4

if i put the string that i get from file into a popup the encode get wrong but if i print in console its appear write
why this happen?
tanks

Revision history for this message
j (j-the-k) said :
#5

I have a similar problem with german special characters. You could try setting the encoding by adding this line as first line of your test:
# -*- coding: utf-8 -*-
But this did not help me in all cases, there seem to be some internal problems with the encoding, but I don't know how to workaround them.

Revision history for this message
RaiMan (raimund-hocke) said :
#6

@ j-the-k
# -*- coding: utf-8 -*-
is no longer needed with Sikuli X (done internally automagically) in scripts when running them with Sikuli's running features.

You are right, this is an encoding problem.

I will come back soon with an answer.

Revision history for this message
RaiMan (raimund-hocke) said :
#7

--- file has utf-8 encoding already
This is the best case, since you have to do nothing else in your Sikuli scripts.
If you want to have choices and more features, use Notepad++ instead (http://notepad-plus-plus.org/).
Coding: UTF-8 without BOM
(BOM would add additional 3 bytes at the beginning of the file!)

--- file does not have utf-8 encoding
This is normally the case, when saving files in Windows with normal Notepad (it is some extended ASCII encoding called ANSI or Latin-1(for WesternEurope), depends on the locale)

The following works, if there are embedded utf-8 characters in a file, that is read as a byte string (normal operation):

infoText = []
xReader = file("C:\\Users\\fgandini\\Desktop\\infos.txt")
for line in xReader.readlines():
    infoText.append(line.strip().encode("utf-8"))

Revision history for this message
fernando gandini (fernando-gandini) said :
#8

Hi RaiMan, tanks a lot for the answers.....

but now i have another strange problem...

read from UTF-8 file --------
Fernando Gandini
Física
-------------------------------------------------------------------------
infoText = []
xReader = file("C:\\Users\\fgandini\\Desktop\\infos2.txt")
for line in xReader:
    line.rstrip("\n")
    infoText.append(line.strip())

popup(infoText[1])
if (infoText[1] == "Física"):
    popup("certo")
else:
    popup("errado")
print infoText[1]
exit()
---------------------------------------
popup(infoText[1]) ------> appear with wrong encoding
if (infoText[1] == "Física"): --------> return true, so read the file with
correct encode
print infoText[1] ------------> appear correct in the console

2012/4/12 RaiMan <email address hidden>

> Your question #193332 on Sikuli changed:
> https://answers.launchpad.net/sikuli/+question/193332
>
> Status: Open => Answered
>
> RaiMan proposed the following answer:
> --- file has utf-8 encoding already
> This is the best case, since you have to do nothing else in your Sikuli
> scripts.
> If you want to have choices and more features, use Notepad++ instead (
> http://notepad-plus-plus.org/).
> Coding: UTF-8 without BOM
> (BOM would add additional 3 bytes at the beginning of the file!)
>
> --- file does not have utf-8 encoding
> This is normally the case, when saving files in Windows with normal
> Notepad (it is some extended ASCII encoding called ANSI or Latin-1(for
> WesternEurope), depends on the locale)
>
> The following works, if there are embedded utf-8 characters in a file,
> that is read as a byte string (normal operation):
>
> infoText = []
> xReader = file("C:\\Users\\fgandini\\Desktop\\infos.txt")
> for line in xReader.readlines():
> infoText.append(line.strip().encode("utf-8"))
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
> https://answers.launchpad.net/sikuli/+question/193332/+confirm?answer_id=6
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/sikuli/+question/193332
>
> You received this question notification because you asked the question.
>

Revision history for this message
RaiMan (raimund-hocke) said :
#9

popup() has a problem with utf-8 characters (known problem).

this should work:

popup(infoText[1]).decode("utf-8"))

as you might have found out already:

popup("Física")

does not work either.

Revision history for this message
fernando gandini (fernando-gandini) said :
#10

hey RaiMan really tanks....this works fine.

tanks again.
bye

2012/4/12 RaiMan <email address hidden>

> Your question #193332 on Sikuli changed:
> https://answers.launchpad.net/sikuli/+question/193332
>
> Status: Open => Answered
>
> RaiMan proposed the following answer:
> popup() has a problem with utf-8 characters (known problem).
>
> this should work:
>
> popup(infoText[1]).decode("utf-8"))
>
> as you might have found out already:
>
> popup("Física")
>
> does not work either.
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
> https://answers.launchpad.net/sikuli/+question/193332/+confirm?answer_id=8
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/sikuli/+question/193332
>
> You received this question notification because you asked the question.
>

Revision history for this message
RaiMan (raimund-hocke) said :
#11

Fernando said it is ;-)

Revision history for this message
j (j-the-k) said :
#12

@RaiMan
Yesterday I created a script with the IDE on linux, checked it into a subversion repository, checked it out on a windows machine and ran it. It had the letters "äüöß" in it, and produced a "wrong encoding"-exception if # -*- coding: utf-8 -*- is not added to the file. I use rc3.

Revision history for this message
RaiMan (raimund-hocke) said :
#13

@j-the-k
this is what internally is added to the beginning of every script that is either run using the IDE or from command line using either sikuli-ide.jar (as with the .bat's) or sikuli-script.jar using <java -jar sikuli-script.jar some.sikuli>.

         "# coding=utf-8",
         "from __future__ import with_statement",
         "from sikuli import *",
         "setThrowException(True)",
         "setShowActions(False)"

If you use any other method to run your Sikuli Jython scripts (plain Jython, Eclipse, Netbeans or whatever), you have to take care for the file encoding yourself (utf-8 is recommended).

BTW: when adding a comment, that might provoke an answer, pls. subscribe to the question.

Revision history for this message
j (j-the-k) said :
#14

I ran the script with the sikuli-ide.sh but not directly. I use several Sikuli-modules, and the one with "üäöß" in it is one that is imported by the one I executed. Like this:

<file1.sikuli>
print "äüöß"

<file2.sikuli>
import file1

sikuli-ide.sh -r file2.sikuli

=> encoding problem if file1 does not contain # -*- coding: utf-8 -*-

So maybe the encoding is not added to every sikuli-file or only to the ones that are executed directly and not imported?

Revision history for this message
RaiMan (raimund-hocke) said :
#15

@ j-the-k
good finding :-)
yes, that is the difference. the imported scripts/modules are not manipulated, only the main script that is run by Sikuli. That is also the reason, that you have to add "from sikuli import *" yourself to scripts, that you want to import.

I will add a remark to the docs.
Thanks for evaluating this situation.