problem reading extended ascii files

Asked by MaskedMarauder

I can't figure out how to convert some extended ascii files into utf8 or other conventional encoding.

I'm reading some files from a government source (FCC) which come in as ascii. Sort of. file reports them as:

$ file DA-09-735A1.txt
DA-09-735A1.txt: Non-ISO extended-ASCII English text, with CRLF, NEL line terminators

when I try to read the file with less (in gnome-terminal), or vi, I get stuff like:

...
1. The Media Bureau (<93>Bureau<94>) has before it for comparative consideration 21 groups of
mutually exclusive applications for new or modified noncommercial educational (<93>NCE<94>) FM station ...

reading the file with more, konqueror or ooffice, I get stuff like:
... 1. The Media Bureau (�Bureau�) has before...

reading the file with gedit:
... 1. The Media Bureau (“Bureau”) has before...

reading the file with emacs-nox:
... 1. The Media Bureau (\223Bureau\224) has before...

If I use lynx to read the plain text file it tosses the "funny" characters away.
If I use firefox to read the plain text file it all looks proper, so I know something can figure it out.

dos2unix doesn't fix it.

I've tried iconv, tcs and a few other tricks, but nothing seems to work.
iconv -f ISO-8859-1 works a little bit, I get some special characters like '... C.F.R. §§ 1.65 and 73.7003(e). ...' that don't otherwise appear. But nothing I've tried so far (other than firefox) groks the open/close double quotes.

Am I just not guessing the proper 'from' encoding? Or is something darker afoot?

I don't know what tool the FCC is using to generate the documents.

Question information

Language:
English Edit question
Status:
Answered
For:
Ubuntu Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
jari (jaript) said :
#1

Check the encoding from Firefox then: click View->Character Encoding and what is selected there is the correct one, if the document is shown correctly. Then put that to iconv. I think this one is Windows-1252, so try "iconv -f windows-1252" for them.

Can you help with this problem?

Provide an answer of your own, or ask MaskedMarauder for more information if necessary.

To post a message you must log in.