How to deal with non-unicode tags?

Asked by Sergei Ivanov

In my region, most non-English mp3 files have tags - artist, title etc - encoded in CP1251 (this is a 8-bit Cyriilic encoding used in Windows players). Is there any way to make Amarok display these tags correctly?

Many apps have options for "fallback encoding" for non-unicode text, but I could not find such an option in amarok. It just assumes Latin-1 or something similar and displays the tags as strings full of accented Latin characters.

Question information

Language:
English Edit question
Status:
Solved
For:
Ubuntu amarok Edit question
Assignee:
No assignee Edit question
Solved by:
Harald Sitter
Solved:
Last query:
Last reply:
Revision history for this message
Harald Sitter (apachelogger) said :
#1

Amarok _only_ uses UTF-8 for encoding, so you'll have to convert your tags to UTF-8 (should be supported by most windows player anyway).

Revision history for this message
Sergei Ivanov (svivanov) said :
#2

Thank you. My problem is not about compatibility of my files with Windows players (in fact, I never use Windows). It's more about playing files on a CD got from a local store. As far as I see, the tags in the file are either Unicode (16-bit) or a 8-bit encoding. The latter is interpreted by Amarok according to LC_CTYPE if LC_CTYPE is 8-bit (seen this behavior on Debian etch), or as Latin-something if LC_CTYPE is UTF-8.
I tried to run amarok with another LC_CTYPE but then it cannot read non-ascii filenames.

Revision history for this message
Harald Sitter (apachelogger) said :
#3

Since when do CD's have actual files with tags and stuff stored on them?

Revision history for this message
Sergei Ivanov (svivanov) said :
#4

Why not? A CD can have a filesystem on it (or should I have said "CD-ROM"?).
Here you can buy a CD named like "the history of band X from 19XX to 19XX", full of MP3 songs
and other stuff (photos, lyrics, etc). Some of these CD's are even legal:)
My car radio can play MP3 files from a data CD, and it's not an exotic device.

Back to the topic, I see that it's best to let amarok manage a collection on a hard drive and recode the tags
the way it likes. Now the question is: how to recode the tags? Can you suggest a tool for that?

On this system, 'apt-cache search mp3 tag' returns 65 packages and I'm sufficiently lost in this maze.
And all command-line players I tried do not support Unicode tags, so I wonder which of the tag editors do that.
Amarok itself and another GUI tool (called Kid3) do not help - I cannot read the tags in the first place
(because of the encoding), and even if I could, I would not be happy to retype every tag of every song.

Revision history for this message
Best Harald Sitter (apachelogger) said :
#5

*lol*

I suggest easytag, it's a fairly advanced tagging application. There is probably some CLI based solution as well (have a look at http://amarok.kde.org/forum).

Revision history for this message
Sergei Ivanov (svivanov) said :
#6

Thank you. Indeed easytag understands encodings and can work with whole directories.
It turns out that there can be two kinds of tags, ID3v1 (language-dependent 8-bit encoding)
and ID3v2 (unicode), and both can be present in the same file. The problem with amarok is
caused by files where ID3v1 is present but ID3v2 is not. Easytag can read ID3v1 using a
configurable encoding and store the data to ID3v2, the result is suitable for amarok.

Still IMHO it would be handy if amarok had a configurable encoding for those lonely ID3v1 tags.
Kaffeine has such an option. Kate, konqueror and friends handle encodings very well.
Please send a feature request to upstream devs if you think this is appropriate.

Revision history for this message
Harald Sitter (apachelogger) said :
#7

I'm Amarok's Project Manager :-P

Anyway, in the past we had a reencode feature, but we had to remove it since it did more bad than good. We simply came to the understanding that every file you rip on Linux nowadays will probably be v2 by default (mainly because there is jut no reason to prefer v1) and for all files which are still v1 (like ripped on windows) there are millions of applications/scripts out there to convert them to v2+unicode.

Revision history for this message
Sergei Ivanov (svivanov) said :
#8

> Anyway, in the past we had a reencode feature, but we had to remove it
> since it did more bad than good. We simply came to the understanding
> that every file you rip on Linux nowadays will probably be v2 by default
> (mainly because there is jut no reason to prefer v1) and for all files
> which are still v1 (like ripped on windows) there are millions of
> applications/scripts out there to convert them to v2+unicode.

I see the point. But did you try to test this theory - render v1 tags
in Greek rather than Latin and see if users complain?

I don't know what the reencode feature did, I meant an option
for *reading* v1 tags. There is no need to write them back
in different encoding (or change them at all).