Ubuntu

problem reading extended ascii files

Asked by MaskedMarauder on 2009-04-01

I can't figure out how to convert some extended ascii files into utf8 or other conventional encoding.

I'm reading some files from a government source (FCC) which come in as ascii. Sort of. file reports them as:

$ file DA-09-735A1.txt
DA-09-735A1.txt: Non-ISO extended-ASCII English text, with CRLF, NEL line terminators

when I try to read the file with less (in gnome-terminal), or vi, I get stuff like:

...
1. The Media Bureau (<93>Bureau<94>) has before it for comparative consideration 21 groups of
mutually exclusive applications for new or modified noncommercial educational (<93>NCE<94>) FM station ...

reading the file with more, konqueror or ooffice, I get stuff like:
... 1. The Media Bureau (�Bureau�) has before...

reading the file with gedit:
... 1. The Media Bureau (Bureau) has before...

reading the file with emacs-nox:
... 1. The Media Bureau (\223Bureau\224) has before...

If I use lynx to read the plain text file it tosses the "funny" characters away.
If I use firefox to read the plain text file it all looks proper, so I know something can figure it out.

dos2unix doesn't fix it.

I've tried iconv, tcs and a few other tricks, but nothing seems to work.
iconv -f ISO-8859-1 works a little bit, I get some special characters like '... C.F.R. §§ 1.65 and 73.7003(e). ...' that don't otherwise appear. But nothing I've tried so far (other than firefox) groks the open/close double quotes.

Am I just not guessing the proper 'from' encoding? Or is something darker afoot?

I don't know what tool the FCC is using to generate the documents.

Question information

Language:: English Edit question

Status:: Answered

For:: Ubuntu Edit question

Assignee:: No assignee Edit question

Last query:: 2009-04-01

Last reply:: 2009-04-01

Link existing bug

Revision history for this message

jari (jaript) said on 2009-04-01:

Check the encoding from Firefox then: click View->Character Encoding and what is selected there is the correct one, if the document is shown correctly. Then put that to iconv. I think this one is Windows-1252, so try "iconv -f windows-1252" for them.

Can you help with this problem?

Provide an answer of your own, or ask MaskedMarauder for more information if necessary.

To post a message you must log in.

Ask a question

Edit question

Ubuntu

problem reading extended ascii files

Question information

Related bugs

Related FAQ:

Can you help with this problem?

Subscribers