slovenian translations getting broken by the translation system

Bug #121555 reported by Luka Frelih
10
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
High
Unassigned
language-pack-gnome-sl (Ubuntu)
New
Undecided
Unassigned
language-pack-gnome-sl-base (Ubuntu)
New
Undecided
Unassigned
language-pack-kde-sl (Ubuntu)
New
Undecided
Unassigned
language-pack-kde-sl-base (Ubuntu)
New
Undecided
Unassigned

Bug Description

slovenian translations shipped with ubuntu are in a really bad state,
to the point of being more annoying than useful.

the upstream translations and the ones in debian definitely dont have these issues.

this seems to be mostly caused by rosettas poor handling of the merging process,
but i can only guess where the real problem is as i have little knowledge of that process.

symptoms:
-plural forms are unpredictably wrong. slovenian has 4 forms: singular, dual and two plurals
usual gettext formula is something like this:
 n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3
0, singular: V smeteh je 1 predmet.
1, dual: V smeteh sta 2 predmeta.
2, plural a: V smeteh so 3 predmeti. (or 4)
3, plural b: V smeteh je 5 predmetov. (or more or 0 but not 1,2,3 or 4 modulo 100)
this is a correct example of the string "N objects in trash" which is handled wrongly

-sometimes really strange translations of usually short strings appear in the strangest places,
for instance "Ime barve (color name)" in the filename column header in gnome file open dialog,
or "razširitev" (extension in sense of plugin) on the file extension column.

these issues project a very bad image to the slovenian linux localization efforts as more and more people install ubuntu and end up with a computer talking pigeon slovene to them. dont expect people to correct these errors over the no-search-or-navigation web interface of rosetta (i tried that for a while too, its very ungratifying). it will also be impossible to offer ubuntu in this state to any kind of larger installation with lots
of machines used by mere humans in slovenia. when i installed ubuntu for my parents i set up english messages because i didnt want them to be confused or turned off by this on their first contact with linux.

i realize that the language packs system gives you some benefits as a distributer but this is a regression that shouldnt be ignored.

Revision history for this message
Carlos Perelló Marín (carlos) wrote :

I guess that problem is due to the fact that the Ubuntu Slovenian translation group was not created until three months ago. Anyone was able to change translations until that point.

Please, contact them (https://launchpad.net/~ubuntu-l10n-sl/) to give them input on this, although I guess they should be aware of it already.

If there is any work needed from us to do a full reverting task, ask current Slovenian coordinator to request it on https://answers.launchpad.net/rosetta

Rejected this from Launchpad translations because is not really a problem with Launchpad itself.

Changed in rosetta:
status: New → Invalid
Revision history for this message
Štefan Baebler (stefanba) wrote :

Indeed, Slovenian is not the easiest language :)

Not sure how the grammatical number gets chosen in runtime, but sometimes it might be hard already during the translation phase (it took a while for me to gasp the gettext formula).
I ran into rosetta with default install of Ubuntu ("translate this application" or so), so there are probably many novice translators, who would need a better explanation about that when translating. Maybe a short explanation and example phrase should be shown in the translation interface.

https://translations.launchpad.net/josm/trunk/+pots/keys/sl/201/+translate
https://translations.launchpad.net/josm/trunk/+pots/keys/sl/202/+translate

Explanations at each grammatical number could look like (based on Luka's suggestions):
[0], singular (ednina): V smeteh je 301 predmet.
[1], dual (dvojina): V smeteh sta 4.802 predmeta.
[2], plural (množina) 3 or 4: V smeteh so 3 predmeti.
[3], plural (množina) 0 or 5 or more: V smeteh je 5 predmetov.
This would need to be set by language admins, or whoever sets the gettext formula.

----
>for instance "Ime barve (color name)" in the filename column header in gnome file open dialog,
this case it was probably originally just "name" in a color picking dialogue, and the same in file open dialogue. It had one original msgid, but was translated with color dialogue in mind (eg message "pick a colour" for the titlebar of the color picking dialogue was offered for translation just before).

> or "razširitev" (extension in sense of plugin) on the file extension column.
I guess this one is due to the fact that in slovenian we have "razširitev" for plugin and "pripona" or "končnica" for file name extension. But in english it can all be "extension", thus both sharing a single msgid.

It might also be confusing the suggesting algorithm (picking translations of same expressions in other applications), offering a bad advice to the translator if he is not really sure of the context. It helps to determine the message's context by looking in which files it appears, but sometimes Slovenian rich vocabulary can't be mapped directly on the English homonymous messages. At least not without making it sound ridiculous.

Would it be possible to somehow "branch" the translation of some message (eg "extension") into several translations (eg "razširitev" (plugin) and "pripona" (filename extension)), depending on context?

Example of _totally_ wrong suggestions:
https://translations.launchpad.net/josm/trunk/+pots/keys/sl/41/+translate
english: Open
suggestions: Operacija (operation), Dodaj (add), _Odpri (_Open)
I'm sure a proper translation must already exist, but isn't suggested for some reason. is it limited to first 3 suggestions instead of most popular 3 suggestions?)

Revision history for this message
Carlos Perelló Marín (carlos) wrote :

There is already a bug report to show plural form formulas in a human readable format:

https://bugs.launchpad.net/rosetta/+bug/90322

It's not a trivial task, but we plan to add it.

About the problem with context, the way Gettext works we cannot do it at Launchpad level, you need to contact software maintainers so they add such context on application level. With that, .po files will get such context comments and Launchpad will show it to translators (well, not yet, but the infrastructure to use that feature from Gettext is part of our tasks for next development cycle).

Anyway, I doubt all those problems comes from last three months but from last two years and a half since we deployed Ubuntu translations with Launchpad.

Revision history for this message
Luka Frelih (luka) wrote :

imho the translations were broken from the start in the import process, not by clueless rosetta users.
i was one of the users trying to get them into better shape for (ithink) breezy but the web interface doesnt make that easy at all and the feedback loop for translation packs is very long.

the localization team lead was included in the email exchange prior to submitting this bug. hoping they will share their opinion too when they notice this.

i would like to know what can we do to help identify the cause and possible remedies. i hope noone expects us to manually fix everything, this has already proven to be impossible. so we are not talking about revert but reimport from debian probably. without breaking plurals at least.

what will happen with the next release when the gnome translations are again updated upstream?

i fear that there is indeed a system issue in rosetta causing this. how have you managed to decide so fast that there isnt one and marked it invalid?

im not ranting, just trying to understand. thank you.

Revision history for this message
Carlos Perelló Marín (carlos) wrote :

We know there are some missing features to help translators to handle the translation work flow and with every update we fix something else.

I cannot believe that Launchpad Translations (a.k.a. Rosetta) imported wrong translations causing such mess you describe because we didn't have that problem with other languages ever. I agree that not being able to handle easily the translation work flow would contribute to reach that point though. Of course, this still needs someone submitting those translations.

This bug report doesn't really report a problem with Launchpad software itself, that's why I rejected its tasks and opened it in the language packs which are the ones with the broken data that should be fixed, using Launchpad.

As already said, we could 'revert' not 'remove' all translations so they have whatever source packages have, without any change done in Launchpad. we have done it in the past for some templates, but it could be done for a whole distro series (it's an expensive operation, so don't take this as an standard procedure avaiable). We need the confirmation from current Ubuntu Slovenian coordinator to prevent people abusing the system.

I hope this clarify it a bit more.

cheers.

Revision history for this message
Luka Frelih (luka) wrote :

thank you for explaining. but i must still insist that what you cannot beleive is indeed happening.

compare the translations for beagle, for example. both gotten via msgunfmt
http://ljudmila.org/~luka/po/debian/beagle.mo.po is the correctly working one in debian sid
http://ljudmila.org/~luka/po/ubuntu/beagle.mo.po is the broken one from ubuntu feisty language pack

notice that they use different plurals formulas. in fact, all of the packages in ubuntu use the same formula,
where in debian at least these two functionally different formulas are used by different programs.

"Plural-Forms: nplurals=4; plural=(n%100==1 ? 1 : n%100==2 ? 2 : n%100==3 || n"
"%100==4 ? 3 : 0);\n"
is the one used in correct beagle po

"Plural-Forms: nplurals=4; plural=(n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n"
"%100==4 ? 2 : 3);\n"
is the one used in ubuntu beagle po and in all the rest of translation files found in ubuntu language packs

note how indexes are shifted one place between them

if we look at this translation unit:
msgid "Contains {0} Item"
msgid_plural "Contains {0} Items"
msgstr[0] "Vsebuje {0} predmetov"
msgstr[1] "Vsebuje {0} predmet"
msgstr[2] "Vsebuje {0} predmeta"
msgstr[3] "Vsebuje {0} predmete"
it is exactly the same in both the working and broken translation. i guess it was not changed via launchpad.

so, what i am more and more convinced is happening is that rosetta assumes all slovenian translations will be using the same plurals formula, importing the strings by index directly and always uses the canonical formula it has set (one where n=1 has index 0) when exporting. this is clearly a wrong assumption and something that needs to be fixed before a reimport is tried in coordination with the language team, hopefully well before the next long-term-support release.

there are two possible fixes. either
a) store the original plurals formula that came with the translation on import, then use it for export and display it to the translator in the web interface (in original format if human readable is still some time off) or
b) do the necessary mapping (shift) between them on import, so that translators in launchpad can depend on the canonical formula being used in ubuntu.

they both have different good and bad sides re compatibility with upstream and consistency. i lean towards the b option being better in the long run. but have no idea which would be easier to achieve. i discount the option to convince all upstreams to conform to the same formula as unfeasible at this time.

even if this might not affect any other languages should not matter, this looks like a rosetta bug. please dont keep it marked invalid. slovenian translations in upstream might be incomplete, strange or missing sometimes but evidence shows their plurals get broken on import into ubuntu not by launchpad users.

Revision history for this message
Matthew Paul Thomas (mpt) wrote :

Thanks for your patience, Luka. Marking New until Carlos (or another Translations developer) examines the plural forms issue in particular. Why is it that different packages use different Slovenian plural formulas in the first place?

Changed in rosetta:
importance: Undecided → High
status: Invalid → New
Revision history for this message
Carlos Perelló Marín (carlos) wrote :

Indeed, the plural form problem is actually an error. I misunderstood your problem.

We use a default plural formula for each language as a way to fix any broken formula (we saw a lot of them), but seems like when we got Slovenian one, we didn't use a sane default. Thus, the plural forms are well imported, the problem is with the plural form formula we export in the .po file.

Could you confirm whether all Slovenian .po files use the same plural form formula? (I mean, outside Launchpad).

This problem is related with https://bugs.launchpad.net/rosetta/+bug/70470

The other problem with non plural form translations still need to be fixed manually by an Slovenian translator (or do the global revert I offered)

Cheers.

Changed in rosetta:
status: New → Confirmed
Revision history for this message
Luka Frelih (luka) wrote :

in debian there are 2 index-incompatible variants (position rotated one place), each appearing in a few slight variations of parentheses and spacing.
the single fomula in ubuntu seems to be the most popular one in debian, but the other 1 place rotated formula which causes this bug is used in more than a third of the files.
all the formulas use plurals based on n modulo 100.

why? most likely because it all works in gettext.

sample edited output of grep on pofiles:
$ grep -h1 Plural-Forms debian/* | sort | uniq -c | sort -n
      1 "Plural-Forms: nplurals=2; plural=(n != 1);\n"
      1 "Plural-Forms: nplurals=4; plural=(n%100==1 ? 1 : n%100==2 ? 2 : n%100==3 || "
      1 "n%100==4 ? 3 : 0);\n"
      1 "Plural-Forms: Plural-Forms: nplurals=4; plural=(n%100==1 ? 1 : n%100==2 ? "
      1 "2 : n%100==3 || n%100==4 ? 3 : 0);\n"
      7 "Plural-Forms: nplurals=4; plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%"
      7 "100==4 ? 2 : 3\n"
     18 "Plural-Forms: nplurals=4; plural=(n%100==1 ? 1 : n%100==2 ? 2 : n%100==3 || n"
     18 "%100==4 ? 3 : 0);\n"
     23 "Plural-Forms: nplurals=4; plural=(n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n"
     23 "%100==4 ? 2 : 3);\n"
     51 --
$ grep -h1 Plural-Forms ubuntu/* | sort | uniq -c | sort -n
     52 "Plural-Forms: nplurals=4; plural=(n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n"
     52 "%100==4 ? 2 : 3);\n"
     52 --

would be nice if both variants were detected as valid slovenian and imported in a compatible way. (merged)

Revision history for this message
Bernard Banko (beernarrd) wrote :

The first reason for the second most widespread variant is mnemotechnical (but the one in launchpad is more simple to think out...):

If we define plural forms as:
nplurals=4; plural=(n%100==1 ? 1 : n%100==2 ? 2 : n%100==3 || n%100==4 ? 3 : 0
we get:

msgid "%d file"
msgid_plural "%d files"
msgstr[0] "%d datotek"
msgstr[1] "%d datoteka"
msgstr[2] "%d datoteki"
msgstr[3] "%d datoteke"

or to make it more clear:
0, plural: "5 datotek"
1, singular: "1 datoteka"
2, dual: "2 datoteki"
3, plural for 3 and 4: "3 datoteke"

Where we can see, that the attributes match with the plural number.
So if we are going to unify plural forms it is much better from translators' point of view to have them in this form.

Revision history for this message
Данило Шеган (danilo) wrote :

An already anticipated problem, I'll be working on it for 1.1.12. See #70470 for tracking progress.

To change the default plural forms formula in Launchpad for Slovenian, please open a question in the answer tracker: https://answers.launchpad.net/rosetta/.

Thanks for all the details about the problem, it's highly appreciated!

Revision history for this message
Bernard Banko (beernarrd) wrote :

Question opened
(Question #18324)

Revision history for this message
Štefan Baebler (stefanba) wrote :

> Question opened
> (Question #18324)
For easier access here's the link:
https://answers.launchpad.net/rosetta/+question/18324

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.