Translation import problems [1000 pcs. ;) ]

Asked by Basic

This question is about importing *.po and *.pot files written manually in Windows. During discussion here J. T. Vermeulen and C. Hovey. helped me to get through some of the problems. But unfortunately - one problem solves, another appears.

Update 13.07.2009 14:45 CET

Question information

Language:
English Edit question
Status:
Solved
For:
Launchpad itself Edit question
Assignee:
Jeroen T. Vermeulen Edit question
Solved by:
Jeroen T. Vermeulen
Solved:
Last query:
Last reply:
Whiteboard:
This user has had a long struggle to get this project going... setting it up, pushing code, and now translations. He really want to use Launchpad despite all the pain he has had.
Revision history for this message
Basic (basicxp) said :
#1

Come on! It's already 8th of July!
7 files: board.po, pass.po, passent.po, about.po, debug.po, splash.po and penalty.po have been in "Needs review" status for THREE days!

Revision history for this message
Curtis Hovey (sinzui) said :
#2

The files cannot be imported because the import expects POT files. Rename the files to .pot and upload thoses

Revision history for this message
Basic (basicxp) said :
#3

Just need to wait.

Revision history for this message
Curtis Hovey (sinzui) said :
#4

Also, the base language for translations is English. I believe your POT files are Russian. The POT files that are uploaded must be English messages.

Revision history for this message
Basic (basicxp) said :
#5

What do I do now, because my all these Russian strings myst be translated to English etc. Is there any way to do that?

Revision history for this message
Basic (basicxp) said :
#6

Sorry, correct phrase:
What do I do, because all these Russian strings must be translated to English and other languages, but not English to Russian etc. Is there any way to do that?

Revision history for this message
Curtis Hovey (sinzui) said :
#7

Sorry. I do not have a good answer. Launchpad does not support this scenario. The UI will break if the POT is not English. I think someone must do the hard work of changing the code to use English messages first. Then Launchpad can be used to translate to other languages.

Revision history for this message
Basic (basicxp) said :
#8

Well, these are not good news. But still thank you!
By the way, do you know any site which supports this kind of translation (from Russian to others) with *.po and *.pot files? That would be nice to know.
Thanks for help, Roman.

Revision history for this message
Curtis Hovey (sinzui) said :
#9

I do not know of any other websites. It might be possible to do a rough translation of scorecard into English using one or more large packages in launchpad. You can search for Russian messages in the database of used suggestions, and replace the native Russian messages with the English message. You can copy the English message and Russian message to the Russian PO file.

For example:
https://translations.launchpad.net/ubuntu/jaunty/+source/firefox-3.0/+pots/firefox/ru/+translate?batch=10&show=all&search=%D0%AF%D0%BD%D0%B4%D0%B5%D0%BA%D1%81.%D0%9B%D0%B5%D0%BD%D1%82%D0%B0

When you are done you can upload the POT, then ask for help translating the English into an English dialect (en_UK) to get better messages to us in scorecard

Revision history for this message
Basic (basicxp) said :
#10

I wanted to do pretty much same thing: I'll translate strings to English, upload them here, write a Russian translation and let other translate strings from English to other lauguages. OK, I'll try. I'll write here, if I'll succeed.

Revision history for this message
Basic (basicxp) said :
#11

What happened now? I uploaded "sb-en.pot" file with English strings. Few hours went by - still "Needs review".

Revision history for this message
Curtis Hovey (sinzui) said :
#12

I believe someone from the translation's team needs to look at the queue and approve it, or explain if there is a fault. I assigned the question to Jeroen.

Revision history for this message
Basic (basicxp) said :
#13

Thank you very much!

Revision history for this message
Basic (basicxp) said :
#14

Still waiting... It's not a big problem, if this doesn't work, but it would be cooler if all systems work properly, isn't it? :)

Revision history for this message
Jeroen T. Vermeulen (jtv) said :
#15

Hello Roman,

Did you not get my email about this on July 7th? If not, perhaps it was discarded because it included the name of the project in the subject line, and something choked on the Cyrillic. I believe some spam filters are strict about non-ASCII subject lines, so I'll try to avoid those in the future.

My email said exactly what Curtis later explained here: the base language needs to be English. Since your new upload fixes this, I've approved it. As long as the filename doesn't change, future uploads will be approved automatically.

Don't forget to enable translation of your project in Launchpad! Under the project details, check the "Translations for this project are done in Launchpad" box, and save.

Jeroen

Revision history for this message
Jeroen T. Vermeulen (jtv) said :
#16

Excuse me: my email was dated July 10th, not the 7th. (My email client uses a confusing date notation). We did not notice the uploads before then because of the naming problem.

Revision history for this message
Basic (basicxp) said :
#17

Yes, import succeeded, but there are 0 messages (strings) - how could that be?

Revision history for this message
Basic (basicxp) said :
#18

I found a mistake in my file - there were 59 msgid and 58 (!!) msgstr, so I corrected it and maybe now it will work.

Revision history for this message
Basic (basicxp) said :
#19

Still "Failed" on the new file too. Could you please check what's wrong with it (sb-en.pot)? Thank you for help!

Revision history for this message
Best Jeroen T. Vermeulen (jtv) said :
#20

I had already noticed that one and uploaded a fixed version, but that wasn't the real problem. Your file contains non-ASCII characters in some of the comments—Cyrillic text, presumably—but there's no header to specify what encoding they're in. When I remove those and upload again, at least I get proper errors again.

From the mention of VB files in the template I'm guessing that you may be using Windows. Windows uses UTF-16, whereas the rest of the world is mostly standardizing on UTF-8. So it looks like the parser stumbled over non-ASCII characters it couldn't decode as UTF-8.

Your file needs a header that, among other things, specifies the file's encoding if there are any non-ASCII characters. The header in a gettext file is a translation for the empty string, before any of the other messages. For you it might look something like:

msgid ""
msgstr ""
"Project-Id-Version: scoreboard 1.0\n"
"POT-Creation-Date: 2009-07-13 12:01+0000\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-16\n"
"Content-Transfer-Encoding: 8bit\n"

(Of course you'll have to change some of the details). I've filed bug 398473 about the fact that the failure did not produce a proper notification email etc.

Roman, you have been a very unlucky user so far. Thanks for sticking with it; we'll get it all sorted out and hopefully end up with a better service because of it!

Revision history for this message
Basic (basicxp) said :
#21

Don't tell me that I'm unlucky! I made a "Code" section to work in click of my fingers, same with bugs, blueprints, answers etc. Now I'll try this encoding declaration. By the way, I wrote this file by myself, not by running "gettext".

Revision history for this message
Jeroen T. Vermeulen (jtv) said :
#22

Well most of it seems fine. :-) I just put up a fix for the bug I reported; hopefully we'll be able to roll it out with 2.2.7 in a week and a half.

Revision history for this message
Basic (basicxp) said :
#23

Well, now I got a message about one more FAILED import (*@?# !!). This time it shows following: Error: [some meaningless charachters]msgid "" (Line 1). Seems like some file problem. Now I tried converting file into UNIX-format UTF-8 with Notepad++. Let's see, what's going to happen this time.

Here is a little "irony" - a Code section, which I never used before - work fine, but Rosetta, which I used many times as a translator (Ubuntu), is really hard to setup properly. :)

Revision history for this message
Basic (basicxp) said :
#24

Here is the exact error message:

We were unable to import the file because of errors in its format:

Line 1: Invalid content: u'\ufeffmsgid ""'

Revision history for this message
Данило Шеган (danilo) said :
#25

Please check your files with "msgfmt -cv" before uploading, and if at all possible, produce the file with xgettext: that would ensure that you are creating properly formatted PO files in the first place.

To draw a parallel, it would be as if you tried to create ".bzr" directories for your branches in Notepad++: not impossible, but hard, and requires a lot more knowledge.

Revision history for this message
Basic (basicxp) said :
#26

Где мне взять msgfmt и xgettext для Windows/DOS?

Where do I get msgfmt and xgettext for Windows/DOS?

Revision history for this message
Данило Шеган (danilo) said :
#27

Sorry, but I only speak Serbian :) You can look for "gettext windows" to find it, or just fetch ZIP files directly from http://ftp.gnu.org/pub/gnu/gettext/gettext-tools-0.13.1.bin.woe32.zip and http://ftp.gnu.org/pub/gnu/gettext/gettext-runtime-0.13.1.bin.woe32.zip (these are slightly older versions, but should be more than enough for what you need).

Refer to the documentation in those ZIP files for how to go on from there.

Revision history for this message
Jeroen T. Vermeulen (jtv) said :
#28

\ufeff means Unicode character with hex code FEFF. That's apparently in your file as a byte-order mark. Your editor may have inserted one of those at the beginning of the file to tell software that the file is in Little-Endian UTF-16, not in Big-Endian UTF-16. The parser expects the msgid keyword to come at the beginning of the line, but finds this character first.

Whether a program counts the character as text is a bit subjective, I suppose. It's not really whitespace, and it's not normally shown, but it can be more than just a format marker. So it's understandable that our parser sees it as an extraneous and confusing character.

If you convert the file to UTF-8, take care that the byte-order marker doesn't stay in. Such a marker isn't necessary in UTF-8 since it's byte-order-independent, but the character does still exist there.

Revision history for this message
Basic (basicxp) said :
#29

Try #... oh, forgot which one, doesn't matter. So I download a hex editor and found 3 bytes of cr*p before msgid! I deleted them and let's see what's going to be now. :)

Revision history for this message
Basic (basicxp) said :
#30

By the way, couldn't you please clean a log of old files (those which Failed and Deleted) from server, so it's going to be easier for me to navigate in "Translation Import Status" section. Thank you!.

Revision history for this message
Basic (basicxp) said :
#31

Thanks Jeroen T. Vermeulen, that solved my question.

Revision history for this message
Jeroen T. Vermeulen (jtv) said :
#32

The Deleted entries are cleaned up automatically after... I believe 3 days. That gives people some time to think again. The Failed ones stick around, though we've got a bug somewhere saying we should clean them up as well.