problem reading extended ascii files
I can't figure out how to convert some extended ascii files into utf8 or other conventional encoding.
I'm reading some files from a government source (FCC) which come in as ascii. Sort of. file reports them as:
$ file DA-09-735A1.txt
DA-09-735A1.txt: Non-ISO extended-ASCII English text, with CRLF, NEL line terminators
when I try to read the file with less (in gnome-terminal), or vi, I get stuff like:
...
1. The Media Bureau (<93>Bureau<94>) has before it for comparative consideration 21 groups of
mutually exclusive applications for new or modified noncommercial educational (<93>NCE<94>) FM station ...
reading the file with more, konqueror or ooffice, I get stuff like:
... 1. The Media Bureau (�Bureau�) has before...
reading the file with gedit:
... 1. The Media Bureau (Bureau) has before...
reading the file with emacs-nox:
... 1. The Media Bureau (\223Bureau\224) has before...
If I use lynx to read the plain text file it tosses the "funny" characters away.
If I use firefox to read the plain text file it all looks proper, so I know something can figure it out.
dos2unix doesn't fix it.
I've tried iconv, tcs and a few other tricks, but nothing seems to work.
iconv -f ISO-8859-1 works a little bit, I get some special characters like '... C.F.R. §§ 1.65 and 73.7003(e). ...' that don't otherwise appear. But nothing I've tried so far (other than firefox) groks the open/close double quotes.
Am I just not guessing the proper 'from' encoding? Or is something darker afoot?
I don't know what tool the FCC is using to generate the documents.
Question information
- Language:
- English Edit question
- Status:
- Answered
- For:
- Ubuntu Edit question
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Can you help with this problem?
Provide an answer of your own, or ask MaskedMarauder for more information if necessary.