encoding problem when exporting log
Hello,
how do I export the log of a repository to file with UTF-8 encoding if it contains characters such as ü, ä?
What I tried without success:
--- Changelog
bzr_logfile_path = os.path.
target_encoding = 'utf-8'
bzr_logfile = codecs.
p_log = subprocess.
(stdout, stderr) = p_log.communicate()
bzr_logfile.
bzr_logfile.close()
Now, I get this error:
"UnicodeDecodeE
The best would be to have something like:
bzr version-info --format python > library/log.py
P.S.: I would like to use the log file to automatically include this as a changelog in a Sphinx based documentation.
Any hints are welcome, thanks in advance!
Question information
- Language:
- English Edit question
- Status:
- Answered
- For:
- Bazaar Edit question
- Assignee:
- No assignee Edit question
- Last query:
- 2009-03-31
- Last reply:
- 2011-11-22
This question was reopened
- 2009-03-18 by Timmie
| Martin Pool (mbp) said : | #1 |
If you look in ~/.bzr.log, I suspect you'll find your locale is set to one that wants ascii output not utf-8. If you change that, bzr should no longer give that message.
However, we probably shouldn't be giving this error anyhow. Can you please file a bug about it including the traceback from ~/.bzr.log for the UnicodeDecodeError.
Or is that error in fact being raised in your driver program?
Also, why send the stdout through a pipe rather than just directly to the output file?
| Timmie (timmie) said : | #2 |
Hello,
thanks for your explanantions.
I sometimes work on Windows. There I couldn't find the ~/.bzr.log.
Is it possible to set this on a repository/ direcory basis just like .bzrignore?
The error was raised when saving and later when reading the output of bzrlog.
| Timmie (timmie) said : | #3 |
> P.S.: I would like to use the log file to automatically include this as a changelog in a Sphinx based documentation.
I did not write it directly to a file because I wanted to see in Ipython the where the encoding issue occured.
When I write directly on the files on the file system and open that file in the editor the umlauts are not shown correctly and rather replaced by strange characters.
my code snipped:
### CODE ###
#--- BZR: changelog information
def write_changelog
bzr_
bzr_logfile = codecs.
p_log = subprocess.
(stdout, stderr) = p_log.communicate()
bzr_
bzr_
#UnicodeDec
# like bzr version-info --format python > vers_test.py
repo_path = os.path.join('..', '.')
output_dir = os.path.join('.')
write_changelog
gives the follwing error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 3091: ordinal not in range(128)
WARNING: Failure executing file: <sphinx_tools.py>
It does not pop up in bzr.log because it is not seen as a failure by bzr.
A bug was files:
https:/
But the description is not yet complete.
Please give me a hint how to get around the encoding issue.
Thanks in advance.
| Timmie (timmie) said : | #4 |
Hello,
does anyone have an idea?
Would you need more information from my side?
I still cannot export the log in a UTF-8 or ASCII format...
| Wouter van Heyst (larstiq) said : | #5 |
On Thu, Mar 12, 2009 at 08:54:47AM -0000, Tim wrote:
> Question #63601 on Bazaar changed:
> https:/
>
> Tim gave more information on the question:
> Hello,
> does anyone have an idea?
> Would you need more information from my side?
>
> I still cannot export the log in a UTF-8 or ASCII format...
Log has encoding_type = 'replace', which explains why you don't see the
umlauts when the output encoding can't represent them.
However, _why_ it thinks it can't represent them is the issue here. What
is the encoding on the subprocess created stdout for example?
Wouter van Heyst
| Timmie (timmie) said : | #6 |
> What is the encoding on the subprocess created stdout for example?
Please tell me how I can find this out and I will post it ASAP.
Thanks.
| Martin Pool (mbp) said : | #7 |
It's the .encoding attribute of the file object.
| Timmie (timmie) said : | #8 |
9: _ip.system("bzr log --short > test.txt")
10: f = open('test.txt', 'r')
11: f.encoding
12: f.encoding()
11 & 12 do not show any output.
I get the changelog with the following:
3 :
p_log = subprocess.
4 : (stdout, stderr) = p_log.communicate()
How can I know the encding of the stdout string or convert the stdout into a utf-8 string?
| Wouter van Heyst (larstiq) said : | #9 |
On Fri, Mar 13, 2009 at 03:56:53PM -0000, Tim wrote:
> Question #63601 on Bazaar changed:
> https:/
>
> Status: Answered => Open
>
> Tim is still having a problem:
> 9: _ip.system("bzr log --short > test.txt")
> 10: f = open('test.txt', 'r')
> 11: f.encoding
> 12: f.encoding()
>
> 11 & 12 do not show any output.
That means it's None.
> I get the changelog with the following:
>
> 3 :
> p_log = subprocess.
> stdout=
> 4 : (stdout, stderr) = p_log.communicate()
>
> How can I know the encding of the stdout string or convert the stdout
> into a utf-8 string?
Could you try changing stdout to bzr_log? The following works for me.
import codecs
import subprocess
bzr_
target_encoding = 'utf-8'
bzr_logfile = codecs.
p_log = subprocess.
(stdout, stderr) = p_log.communicate()
bzr_
>> cat bzr_log.txt
1 Wouter van Heyst 2009-03-14
→↓←
Wouter van Heyst
| Timmie (timmie) said : | #10 |
Hello,
that code suggestion worked for me.
But the umlauts are still in stange characters or not readable when opening the file.
| Timmie (timmie) said : | #11 |
Hello,
I set up a self-contained example here:
https:/
May someone provide me with code to export the changelog including umlauts?
Thanks in advance!
| Timmie (timmie) said : | #12 |
Sorry, wanted to hit the "Still need an Answer"...
| Martin Pool (mbp) said : | #13 |
Hi Tim,
I downloaded your sample data. I'm running bzr 1.14dev on Ubuntu Jaunty with LANG=en_AU.UTF-8 (in a gnome-terminal.) When I run "bzr log" I see the output with correct umlauts. If I redirect the output into a file with "bzr log >log.out" then I get a file that's byte-for-byte identical with what's sent to stdout. I can open it in eg gedit and see the umlauts there too.
Leaving aside for the moment trying to run bzr from inside your python program, what happens if you just run it from a terminal?
| Timmie (timmie) said : | #14 |
Hello Martin,
thanks for looking at this.
Actually, I never tried this on Linux.
The problems occured on a windows based install.
What do you suggest?
| Martin Pool (mbp) said : | #15 |
Tim,
I suggest you start a cmd window, and run bzr log there and see if it works. Then try redirecting the output and open that in eg notepad. It's possible we're detecting the wrong encoding on Windows.
| Timmie (timmie) said : | #16 |
> start a cmd window, and run bzr log there and see if it works.
in cmd.exe and console the umlauts are shown correctly!
> try redirecting the output and open that in eg notepad.
I opened in Notepad.exe & Notepad++.exe
The umlauts are wrong and not shown correctly when redirected to a file by:
bzr log --short > log.txt
| Martin Pool (mbp) said : | #17 |
OK, so it does seem this is a real bug 340394. I propose to then close this question and just deal with it over there.
| Éric Araujo (merwok) said : | #18 |
Does setting PYTHONIOENCODIN
Can you help with this problem?
Provide an answer of your own, or ask Timmie for more information if necessary.
