sorting files by name

Asked by fb268

Hi,
I did not find how Ubuntu is sorting the files by name. I thought it was UTF16 and then # comes before digits that come before @ that comes before capital letters before small letters. But it does not work like that: # comes first, normal but after that capital letters and then @ and then small letters. Is there another criteria than merely the name ?

Question information

Language:
English Edit question
Status:
Solved
For:
Ubuntu Edit question
Assignee:
No assignee Edit question
Solved by:
fb268
Solved:
Last query:
Last reply:
Revision history for this message
actionparsnip (andrew-woodhead666) said :
#1

Is this in the terminal?

Revision history for this message
fb268 (fb268) said :
#2

On the graphic interface Gnome/Unity, application Files,

Revision history for this message
Manfred Hampl (m-hampl) said :
#3

Sorting is a highly complicated task, and the sort order in use for you is defined in the "locale settings".

What do you see as the setting for "LC_COLLATE" in the output of the command "locale"?
Take the first part of this and look at the file /usr/share/i18n/locales/"your_locale", e.g. /usr/share/i18n/locales/en_US

This contains a paragraph for LC_COLLATE which in turn probably refers to another file (for en_US it's copying "iso14651_t1")
The file /usr/share/i18n/locales/iso14651_t1 in turn includes iso14651_t1_common, and finally in /usr/share/i18n/locales/iso14651_t1_common you find the sorting order (don't ask me the details what the meaning of all the columns is).

In any case somewhere in that file you should see a table starting with

<U0061> <a>;<BAS>;<MIN>;IGNORE # 198 a
<U00AA> <a>;<PCL>;<EMI>;IGNORE # 199 ª
<U00E1> <a>;<ACA>;<MIN>;IGNORE # 200 á
<U00E0> <a>;<GRA>;<MIN>;IGNORE # 201 à
<U00E2> <a>;<CIR>;<MIN>;IGNORE # 202 â
<U00E3> <a>;<TIL>;<MIN>;IGNORE # 203 ã
<U00E4> <a>;<REU>;<MIN>;IGNORE # 204 ä
<U00E5> <a>;<RNE>;<MIN>;IGNORE # 205 å
<U0103> <a>;<BRE>;<MIN>;IGNORE # 206 <a(>
<U0105> <a>;<OGO>;<MIN>;IGNORE # 207 <a;>
<U0101> <a>;<MAC>;<MIN>;IGNORE # 208 <a->
<U01CE> <a>;<CAR>;<MIN>;IGNORE # 209 <a<>
<U01DF> <a>;<REU>;<MIN>;<MAC> # 210 ǟ
<U01E1> <a>;<PCT>;<MIN>;<MAC> # 211 ǡ
<U01FB> <a>;<RNE>;<MIN>;<ACA> # 212 ǻ
<U0201> <a>;<DGR>;<MIN>;IGNORE # 213 ȁ
<U0203> <a>;<IBR>;<MIN>;IGNORE # 214 ȃ
<U0227> <a>;<PCT>;<MIN>;IGNORE # 215 ȧ
<U1E01> <a>;<BRN>;<MIN>;IGNORE # 216 ḁ
<U1E9A> <a>;<PCL>;<MIN>;IGNORE # 217 ẚ
<U1EA1> <a>;<BPT>;<MIN>;IGNORE # 218 ạ
<U1EA3> <a>;<HOK>;<MIN>;IGNORE # 219 ả
<U1EA5> <a>;<CIR>;<MIN>;<ACA> # 220 ấ
<U1EA7> <a>;<CIR>;<MIN>;<GRA> # 221 ầ
<U1EA9> <a>;<CIR>;<MIN>;<HOK> # 222 ẩ
<U1EAB> <a>;<CIR>;<MIN>;<TIL> # 223 ẫ
<U1EAD> <a>;<CIR>;<MIN>;<BPT> # 224 ậ
<U1EAF> <a>;<BRE>;<MIN>;<ACA> # 225 ắ
<U1EB1> <a>;<BRE>;<MIN>;<GRA> # 226 ằ
<U1EB3> <a>;<BRE>;<MIN>;<HOK> # 227 ẳ
<U1EB5> <a>;<BRE>;<MIN>;<MAC> # 228 ẵ
<U1EB7> <a>;<BRE>;<MIN>;<BPT> # 229 ặ
<U00E6> "<a><e>";"<LIG><LIG>";"<MIN><MIN>";IGNORE # 230 æ
<U01E3> "<a><e>";"<LIG><LIG>";"<MIN><MIN>";<MAC> # 231 ǣ
<U01FD> "<a><e>";"<LIG><LIG>";"<MIN><MIN>";<ACA> # 232 ǽ
<U0062> <b>;<BAS>;<MIN>;IGNORE # 233 b
<U0253> <b>;<CRL>;<MIN>;IGNORE # 234 ɓ
<U1E03> <b>;<PCT>;<MIN>;IGNORE # 235 ḃ
<U1E05> <b>;<BPT>;<MIN>;IGNORE # 236 ḅ
<U1E07> <b>;<BMA>;<MIN>;IGNORE # 237 ḇ
<U0063> <c>;<BAS>;<MIN>;IGNORE # 238 c
<U00E7> <c>;<CDI>;<MIN>;IGNORE # 239 ç

This shows that the sort order for en_US (and all other locales which use this collating sequence) do not sort in Unicode sequence, but e.g. the "normal a", then all letters "a with diacritic mark like accents, hooks, umlauts, ligatures ..." in a specific sequence, then "normal b", followed by "b with diacritic mark", then "normal c" etc. etc.

I have not checked how the numbers and other symbols get sorted, but the information should also be somewhere in that file.

A web search for "Ubuntu sort sequence", you will find lots of questions and answers to that topic, e.g. https://unix.stackexchange.com/questions/406394/sort-files-alphabetically-with-ls-on-linux

Revision history for this message
fb268 (fb268) said :
#4

Thanks for your very informative response. I found out all the information you indicated.
After further tests and hesitations, it seems that Ubuntu/application FIles ignored special characters except # at least. Then the order is numbers, capital letters, small letters, # and probably some other special characters.
Then a better sustainable manner of naming files is starting with digits or letters rather than special characters.
Best,