Ubuntu
langpack-locales package

Bug #266975
Comment #93

Comment 93 for bug 266975

Revision history for this message

In Sourceware.org Bugzilla #9809, Jwtiyar Nariman (jwtiyar) wrote on 2020-01-14:

#93

Thank you very much dear mike i got it, you made a great job, thanks again.
So now every thing is ready to be accepted in glibc.

Best Regards (In reply to Mike FABIAN from comment #63)
> (In reply to Jwtiyar Nariman from comment #62)
>
> > > Other characters not in this test file are sorted according to the defaults
> > > from
> > >
> > > copy "iso14651_t1"
> >
> > Sorting is good now, but adding these
> > reorder-after <S0631> % ر
> > > <S0695> % ڕ
> > >
> > > reorder-after <S0646> % ن
> > > <S0648> % و
> > > <S06C6> % ۆ
> > iam not understanding because for example this " <S0695> % ڕ " how you
> > order it?
>
> copy "iso14651_t1"
>
> contains
>
> copy "iso14651_t1_common"
>
> and some modifications which affect only Chinese and Japanese.
>
> So we look into the iso14651_t1_common file to see what the default sort
> order is.
>
> We find for example:
>
> ...
> <S0631> % ARABIC LETTER REH
> <S0632> % ARABIC LETTER ZAIN
> <S0691> % ARABIC LETTER RREH
> <S0692> % ARABIC LETTER REH WITH SMALL V
> <S0693> % ARABIC LETTER REH WITH RING
> <S0694> % ARABIC LETTER REH WITH DOT BELOW
> <S0695> % ARABIC LETTER REH WITH SMALL V BELOW
> <S0696> % ARABIC LETTER REH WITH DOT BELOW AND DOT ABOVE
> ...
>
> Looking at this you see that ڕ U+0695 ARABIC LETTER REH WITH SMALL V BELOW
> is sorted right after ڔ U+0694 ARABIC LETTER REH WITH DOT BELOW by default.
> That is not what you want for Kurdish. For Kurdish, you want
> ڕ U+0695 ARABIC LETTER REH WITH SMALL V BELOW to be sorted right after
> ر U+0631 ARABIC LETTER REH.
>
> This is achieved by the rule:
>
> reorder-after <S0631> % ر
> <S0695> % ڕ
>
> Which removes U+0695 from its default position in the sort order
> and inserts it again after U+0631.
>
> reorder-after <S0646> % ن
> <S0648> % و
> <S06C6> % ۆ
>
> does a similar thing to change the sorting of U+0648 and U+06C6.
>
> To find out which of these rules I need, I created the ckb_IQ.UTF-8.in
> test file first and wrote the Kurdish characters in the order you wanted
> into that file.
>
> Then I ran a test sort using a ckb_IQ locale which had *only*
>
> LC_COLLATE
> copy "iso14651_t1"
> END LC_COLLATE
>
> and *nothing* else.
>
> The test sort showed that only U+0695, U+0648, and U+06C6 were sorted
> incorrectly.
> All other characters from your list of Kurdish characters were sorted
> correctly
> already. So I needed only to add rules to fix the sort order for these 3
> characters.
>
> You can see the same by just reading the iso14651_t1_common and find out
> which
> of the Kurdish characters are already in the correct order in that file and
> which are not.
> You have to do nothing for the characters which are already in correct order.
> For the characters which are in a wrong position in iso14651_t1_common, you
> add
> rules like
>
> reorder-after <... collating-symbol after which to reorder ...>
> <... the collating-symbol which should be reordered ...>
>
> I found writing the test file and checking which characters are sorted
> wrongly by default easier than staring at iso14651_t1_common. And it
> is a good idea to have the test file anyway to make sure that the
> Kurdish sort order always stays correct when something is changed in
> glibc. If we have the test file, we will notice when some change causes a
> problem.

Thank you very much dear mike i got it, you made a great job, thanks again.
So now every thing is ready to be accepted in glibc.

Best Regards

Thank you very much dear mike i got it, you made a great job, thanks again.
So now every thing is ready to be accepted in glibc.
 
Best Regards (In reply to Mike FABIAN from comment #63)
> (In reply to Jwtiyar Nariman from comment #62)
> 
> > > Other characters not in this test file are sorted according to the defaults
> > > from
> > > 
> > >     copy "iso14651_t1"
> > 
> > Sorting is good now, but adding these 
> >   reorder-after <S0631> % ر
> > >    <S0695> % ڕ
> > >    
> > >    reorder-after <S0646> % ن
> > >    <S0648> % و
> > >    <S06C6> % ۆ 
> > iam not understanding because for example this " <S0695>  % ڕ   " how you
> > order it?
> 
> copy "iso14651_t1"
> 
> contains
> 
> copy "iso14651_t1_common"
> 
> and some modifications which affect only Chinese and Japanese.
> 
> So we look into the iso14651_t1_common file to see what the default sort
> order is.
> 
> We find for example:
> 
> ...
> <S0631> % ARABIC LETTER REH
> <S0632> % ARABIC LETTER ZAIN
> <S0691> % ARABIC LETTER RREH
> <S0692> % ARABIC LETTER REH WITH SMALL V
> <S0693> % ARABIC LETTER REH WITH RING
> <S0694> % ARABIC LETTER REH WITH DOT BELOW
> <S0695> % ARABIC LETTER REH WITH SMALL V BELOW
> <S0696> % ARABIC LETTER REH WITH DOT BELOW AND DOT ABOVE
> ...
> 
> Looking at this you see that ڕ U+0695 ARABIC LETTER REH WITH SMALL V BELOW
> is sorted right after ڔ U+0694 ARABIC LETTER REH WITH DOT BELOW by default.
> That is not what you want for Kurdish. For Kurdish, you want
> ڕ U+0695 ARABIC LETTER REH WITH SMALL V BELOW to be sorted right after
> ر U+0631 ARABIC LETTER REH.
> 
> This is achieved by the rule:
> 
> reorder-after <S0631> % ر
> <S0695> % ڕ
> 
> Which removes U+0695 from its default position in the sort order
> and inserts it again after U+0631.
> 
> reorder-after <S0646> % ن
> <S0648> % و
> <S06C6> % ۆ
> 
> does a similar thing to change the sorting of U+0648 and U+06C6.
> 
> To find out which of these rules I need, I created the ckb_IQ.UTF-8.in
> test file first and wrote the Kurdish characters in the order you wanted
> into that file.
> 
> Then I ran a test sort using a ckb_IQ locale which had *only*
> 
> LC_COLLATE
> copy "iso14651_t1"
> END LC_COLLATE
> 
> and *nothing* else.
> 
> The test sort showed that only U+0695, U+0648, and U+06C6 were sorted
> incorrectly.
> All other characters from your list of Kurdish characters were sorted
> correctly
> already. So I needed only to add rules to fix the sort order for these 3
> characters.
> 
> You can see the same by just reading the iso14651_t1_common and find out
> which
> of the Kurdish characters are already in the correct order in that file and
> which are not.
> You have to do nothing for the characters which are already in correct order.
> For the characters which are in a wrong position in iso14651_t1_common, you
> add
> rules like
> 
> reorder-after <... collating-symbol after which to reorder ...>
> <... the collating-symbol which should be reordered ...>
> 
> I found writing the test file and checking which characters are sorted
> wrongly by default easier than staring at iso14651_t1_common.  And it
> is a good idea to have the test file anyway to make sure that the
> Kurdish sort order always stays correct when something is changed in
> glibc. If we have the test file, we will notice when some change causes a
> problem.

Thank you very much dear mike i got it, you made a great job, thanks again.
So now every thing is ready to be accepted in glibc.
 
Best Regards

Ubuntulangpack-locales package

Comment 93 for bug 266975

Ubuntu
langpack-locales package