Ubuntu
langpack-locales package

Bug #266975
Comment #92

Comment 92 for bug 266975

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#92

(In reply to Jwtiyar Nariman from comment #62)

> > Other characters not in this test file are sorted according to the defaults
> > from
> >
> > copy "iso14651_t1"
>
> Sorting is good now, but adding these
> reorder-after <S0631> % ر
> > <S0695> % ڕ
> >
> > reorder-after <S0646> % ن
> > <S0648> % و
> > <S06C6> % ۆ
> iam not understanding because for example this " <S0695> % ڕ " how you
> order it?

copy "iso14651_t1"

contains

copy "iso14651_t1_common"

and some modifications which affect only Chinese and Japanese.

So we look into the iso14651_t1_common file to see what the default sort order is.

We find for example:

...
<S0631> % ARABIC LETTER REH
<S0632> % ARABIC LETTER ZAIN
<S0691> % ARABIC LETTER RREH
<S0692> % ARABIC LETTER REH WITH SMALL V
<S0693> % ARABIC LETTER REH WITH RING
<S0694> % ARABIC LETTER REH WITH DOT BELOW
<S0695> % ARABIC LETTER REH WITH SMALL V BELOW
<S0696> % ARABIC LETTER REH WITH DOT BELOW AND DOT ABOVE
...

Looking at this you see that ڕ U+0695 ARABIC LETTER REH WITH SMALL V BELOW
is sorted right after ڔ U+0694 ARABIC LETTER REH WITH DOT BELOW by default.
That is not what you want for Kurdish. For Kurdish, you want
ڕ U+0695 ARABIC LETTER REH WITH SMALL V BELOW to be sorted right after
ر U+0631 ARABIC LETTER REH.

This is achieved by the rule:

reorder-after <S0631> % ر
<S0695> % ڕ

Which removes U+0695 from its default position in the sort order
and inserts it again after U+0631.

reorder-after <S0646> % ن
<S0648> % و
<S06C6> % ۆ

does a similar thing to change the sorting of U+0648 and U+06C6.

To find out which of these rules I need, I created the ckb_IQ.UTF-8.in
test file first and wrote the Kurdish characters in the order you wanted
into that file.

Then I ran a test sort using a ckb_IQ locale which had *only*

LC_COLLATE
copy "iso14651_t1"
END LC_COLLATE

and *nothing* else.

The test sort showed that only U+0695, U+0648, and U+06C6 were sorted incorrectly.
All other characters from your list of Kurdish characters were sorted correctly
already. So I needed only to add rules to fix the sort order for these 3 characters.

You can see the same by just reading the iso14651_t1_common and find out which
of the Kurdish characters are already in the correct order in that file and which are not.
You have to do nothing for the characters which are already in correct order.
For the characters which are in a wrong position in iso14651_t1_common, you add
rules like

reorder-after <... collating-symbol after which to reorder ...>
<... the collating-symbol which should be reordered ...>

I found writing the test file and checking which characters are sorted
wrongly by default easier than staring at iso14651_t1_common. And it
is a good idea to have the test file anyway to make sure that the
Kurdish sort order always stays correct when something is changed in
glibc. If we have the test file, we will notice when some change causes a problem.

(In reply to Jwtiyar Nariman from comment #62)

> > Other characters not in this test file are sorted according to the defaults
> > from
> > 
> >     copy "iso14651_t1"
> 
> Sorting is good now, but adding these 
>   reorder-after <S0631> % ر
> >    <S0695> % ڕ
> >    
> >    reorder-after <S0646> % ن
> >    <S0648> % و
> >    <S06C6> % ۆ 
> iam not understanding because for example this " <S0695>  % ڕ   " how you
> order it?

copy "iso14651_t1"

contains

copy "iso14651_t1_common"

and some modifications which affect only Chinese and Japanese.

So we look into the iso14651_t1_common file to see what the default sort order is.

We find for example:

This is achieved by the rule:

reorder-after <S0631> % ر
<S0695> % ڕ

Which removes U+0695 from its default position in the sort order
and inserts it again after U+0631.

reorder-after <S0646> % ن
<S0648> % و
<S06C6> % ۆ

does a similar thing to change the sorting of U+0648 and U+06C6.

To find out which of these rules I need, I created the ckb_IQ.UTF-8.in
test file first and wrote the Kurdish characters in the order you wanted
into that file.

Then I ran a test sort using a ckb_IQ locale which had *only*

LC_COLLATE
copy "iso14651_t1"
END LC_COLLATE

and *nothing* else.

reorder-after <... collating-symbol after which to reorder ...>
<... the collating-symbol which should be reordered ...>

I found writing the test file and checking which characters are sorted
wrongly by default easier than staring at iso14651_t1_common.  And it
is a good idea to have the test file anyway to make sure that the
Kurdish sort order always stays correct when something is changed in
glibc. If we have the test file, we will notice when some change causes a problem.

Ubuntulangpack-locales package

Comment 92 for bug 266975

Ubuntu
langpack-locales package