encoding problem when exporting log

Asked by Timmie

Hello,
how do I export the log of a repository to file with UTF-8 encoding if it contains characters such as ü, ä?

What I tried without success:

--- Changelog
bzr_logfile_path = os.path.join('_static', 'bzr_log.txt')
target_encoding = 'utf-8'
bzr_logfile = codecs.open(bzr_logfile_path, 'w', encoding=target_encoding)
p_log = subprocess.Popen(('bzr log --short'),
                     stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=-1)
(stdout, stderr) = p_log.communicate()
bzr_logfile.write(stdout)
bzr_logfile.close()

Now, I get this error:
"UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 2871: ordinal not in range(128)"

The best would be to have something like:

bzr version-info --format python > library/log.py

P.S.: I would like to use the log file to automatically include this as a changelog in a Sphinx based documentation.

Any hints are welcome, thanks in advance!

Question information

Language:
English Edit question
Status:
Answered
For:
Bazaar Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:

This question was reopened

Revision history for this message
Martin Pool (mbp) said :
#1

If you look in ~/.bzr.log, I suspect you'll find your locale is set to one that wants ascii output not utf-8. If you change that, bzr should no longer give that message.

However, we probably shouldn't be giving this error anyhow. Can you please file a bug about it including the traceback from ~/.bzr.log for the UnicodeDecodeError.

Or is that error in fact being raised in your driver program?

Also, why send the stdout through a pipe rather than just directly to the output file?

Revision history for this message
Timmie (timmie) said :
#2

Hello,
thanks for your explanantions.

I sometimes work on Windows. There I couldn't find the ~/.bzr.log.

Is it possible to set this on a repository/ direcory basis just like .bzrignore?

The error was raised when saving and later when reading the output of bzrlog.

Revision history for this message
Timmie (timmie) said :
#3

> P.S.: I would like to use the log file to automatically include this as a changelog in a Sphinx based documentation.
I did not write it directly to a file because I wanted to see in Ipython the where the encoding issue occured.

When I write directly on the files on the file system and open that file in the editor the umlauts are not shown correctly and rather replaced by strange characters.

my code snipped:

### CODE ###

#--- BZR: changelog information
def write_changelog_bzr(repo_path, output_dir,
                                        output_file='bzr_revision_log.txt',
                                        target_encoding='utf-8'):

    bzr_logfile_path = os.path.join(output_dir, output_file)
    bzr_logfile = codecs.open(bzr_logfile_path, 'w', encoding=target_encoding)
    p_log = subprocess.Popen(('bzr log --short'),
                         stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=-1)
    (stdout, stderr) = p_log.communicate()
    bzr_logfile.write(stdout)
    bzr_logfile.close()
    #UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 2871: ordinal not in range(128)

    # like bzr version-info --format python > vers_test.py

repo_path = os.path.join('..', '.')
output_dir = os.path.join('.')
write_changelog_bzr(repo_path, output_dir, output_file='changelog.txt')

gives the follwing error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 3091: ordinal not in range(128)
WARNING: Failure executing file: <sphinx_tools.py>

It does not pop up in bzr.log because it is not seen as a failure by bzr.

A bug was files:
https://bugs.launchpad.net/bzr/+bug/340394
But the description is not yet complete.

Please give me a hint how to get around the encoding issue.

Thanks in advance.

Revision history for this message
Timmie (timmie) said :
#4

Hello,
does anyone have an idea?
Would you need more information from my side?

I still cannot export the log in a UTF-8 or ASCII format...

Revision history for this message
Wouter van Heyst (larstiq) said :
#5

On Thu, Mar 12, 2009 at 08:54:47AM -0000, Tim wrote:
> Question #63601 on Bazaar changed:
> https://answers.edge.launchpad.net/bzr/+question/63601
>
> Tim gave more information on the question:
> Hello,
> does anyone have an idea?
> Would you need more information from my side?
>
> I still cannot export the log in a UTF-8 or ASCII format...

Log has encoding_type = 'replace', which explains why you don't see the
umlauts when the output encoding can't represent them.

However, _why_ it thinks it can't represent them is the issue here. What
is the encoding on the subprocess created stdout for example?

Wouter van Heyst

Revision history for this message
Timmie (timmie) said :
#6

> What is the encoding on the subprocess created stdout for example?
Please tell me how I can find this out and I will post it ASAP.

Thanks.

Revision history for this message
Martin Pool (mbp) said :
#7

It's the .encoding attribute of the file object.

Revision history for this message
Timmie (timmie) said :
#8

9: _ip.system("bzr log --short > test.txt")
10: f = open('test.txt', 'r')
11: f.encoding
12: f.encoding()

11 & 12 do not show any output.

I get the changelog with the following:

3 :
p_log = subprocess.Popen(('bzr log --short'),
                         stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=-1)
4 : (stdout, stderr) = p_log.communicate()

How can I know the encding of the stdout string or convert the stdout into a utf-8 string?

Revision history for this message
Wouter van Heyst (larstiq) said :
#9

On Fri, Mar 13, 2009 at 03:56:53PM -0000, Tim wrote:
> Question #63601 on Bazaar changed:
> https://answers.launchpad.net/bzr/+question/63601
>
> Status: Answered => Open
>
> Tim is still having a problem:
> 9: _ip.system("bzr log --short > test.txt")
> 10: f = open('test.txt', 'r')
> 11: f.encoding
> 12: f.encoding()
>
> 11 & 12 do not show any output.

That means it's None.

> I get the changelog with the following:
>
> 3 :
> p_log = subprocess.Popen(('bzr log --short'),
> stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=-1)
> 4 : (stdout, stderr) = p_log.communicate()
>
> How can I know the encding of the stdout string or convert the stdout
> into a utf-8 string?

Could you try changing stdout to bzr_log? The following works for me.

    import codecs
    import subprocess

    bzr_logfile_path = 'bzr_log.txt'
    target_encoding = 'utf-8'

    bzr_logfile = codecs.open(bzr_logfile_path, 'w', encoding=target_encoding)
    p_log = subprocess.Popen(['bzr', 'log', '--short'], stdout=bzr_logfile, stderr=subprocess.PIPE, bufsize=-1)
    (stdout, stderr) = p_log.communicate()
    bzr_logfile.close()

>> cat bzr_log.txt

    1 Wouter van Heyst 2009-03-14
      →↓←

Wouter van Heyst

Revision history for this message
Timmie (timmie) said :
#10

Hello,
that code suggestion worked for me.
But the umlauts are still in stange characters or not readable when opening the file.

Revision history for this message
Timmie (timmie) said :
#11

Hello,
I set up a self-contained example here:
https://bugs.launchpad.net/bzr/+bug/340394/comments/1

May someone provide me with code to export the changelog including umlauts?

Thanks in advance!

Revision history for this message
Timmie (timmie) said :
#12

Sorry, wanted to hit the "Still need an Answer"...

Revision history for this message
Martin Pool (mbp) said :
#13

Hi Tim,

I downloaded your sample data. I'm running bzr 1.14dev on Ubuntu Jaunty with LANG=en_AU.UTF-8 (in a gnome-terminal.) When I run "bzr log" I see the output with correct umlauts. If I redirect the output into a file with "bzr log >log.out" then I get a file that's byte-for-byte identical with what's sent to stdout. I can open it in eg gedit and see the umlauts there too.

Leaving aside for the moment trying to run bzr from inside your python program, what happens if you just run it from a terminal?

Revision history for this message
Timmie (timmie) said :
#14

Hello Martin,
thanks for looking at this.
Actually, I never tried this on Linux.
The problems occured on a windows based install.
What do you suggest?

Revision history for this message
Martin Pool (mbp) said :
#15

Tim,

I suggest you start a cmd window, and run bzr log there and see if it works. Then try redirecting the output and open that in eg notepad. It's possible we're detecting the wrong encoding on Windows.

Revision history for this message
Timmie (timmie) said :
#16

> start a cmd window, and run bzr log there and see if it works.
in cmd.exe and console the umlauts are shown correctly!

> try redirecting the output and open that in eg notepad.
I opened in Notepad.exe & Notepad++.exe

The umlauts are wrong and not shown correctly when redirected to a file by:
bzr log --short > log.txt

Revision history for this message
Martin Pool (mbp) said :
#17

OK, so it does seem this is a real bug 340394. I propose to then close this question and just deal with it over there.

Revision history for this message
Éric Araujo (merwok) said :
#18

Does setting PYTHONIOENCODING=utf-8 help?

Can you help with this problem?

Provide an answer of your own, or ask Timmie for more information if necessary.

To post a message you must log in.