Running catman makes man display junk data

Bug #615045 reported by ooze
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
man-db (Debian)
Fix Released
Unknown
man-db (Ubuntu)
Fix Released
High
Colin Watson
Lucid
Fix Released
High
Colin Watson

Bug Description

After running "sudo catman" to update the man page cache, the display of some man pages will be completely corrupted. Deleting the corresponding entries in /var/cache/man/cat[1-8]/ will bring back the original man pages.

I can reproduce this problem with man-db 2.5.7-3, but not with previous 2.5.6-2. I feel that it can be related to the following upstream change in 2.5.7-1:

> - Always save cat pages in UTF-8 (closes: #446741).

I see the following pipe being run:

/usr/bin/zsoelim | /usr/lib/man-db/manconv -f UTF-8:ISO-8859-1 -t ANSI_X3.4-1968//IGNORE | tbl | nroff -mandoc -Tascii | gzip -c7 | iconv -c -f ANSI_X3.4-1968 -t UTF-8//TRANSLIT

This is very wrong because it tries to convert gzip-compressed data from ASCII to UTF-8! Oh noes!

== Regression details ==
Discovered in version: 2.5.7-2 (lucid), 2.5.7-3 (maverick)
Last known good version: 2.5.6-2 (karmic)

== SRU details ==
Impact: Preformatted manual pages ("cat pages") may be corrupted by running iconv after compression rather than before. This is a regression introduced upstream in man-db 2.5.7.
Patch: Fixed upstream in http://bazaar.launchpad.net/~cjwatson/man-db/trunk/revision/1220 and backported trivially to Debian and Ubuntu Maverick. No problems reported.
TEST CASE: Run 'sudo catman 1', then make sure you're using an 80-column terminal window (so that cat pages are used) and run 'LC_ALL=C man a2p'. The broken version will show binary garbage. To clear out cat pages to test the working version, run 'sudo rm /var/cache/man/cat*/*'.
Regression potential: None seems likely, and the test case should be sufficient to catch misbuilds and the like.

---
Original question by Thinboy00:

Some man pages are appearing corrupt (i.e. if I type man foo at the terminal, I get a bunch of caret escaped characters and a few ascii characters). It looks as if man is reading the compressed data in /usr/share/man instead of uncompressing it first. I will soon attach a screenshot of the problem. The really confusing part, though, is that some man pages consistently appear corrupt and the others consistently appear correctly. I've tried the following, to no avail:
mandb
mandb -t
mandb -c
catman
And yes, I did remember to use sudo on those. mandb -t said "whatis parse for /usr/share/man/[etc]:whatis parse for foo(x) failed" about some pages, but not every broken page (most of them appeared to be related to perl, but not all of them). Since whatis is working correctly, I find it hard to believe that's the problem. I also tried this:
zcat /usr/man/man6/nethack.6.gz | less
and it gave unformatted but readable output, even though man nethack doesn't work. I tried the same trick with slashem, which does work correctly for man, and the zcat trick worked too. I'm surprised their man pages behave differently since the pages themselves are practically identical.
Is my copy of man broken?

ooze (zoe-gauthier)
affects: ubuntu → man-db (Ubuntu)
Changed in man-db (Ubuntu):
status: New → Confirmed
ooze (zoe-gauthier)
tags: added: regression-release
ooze (zoe-gauthier)
summary: - Cached (compressed) man pages are corrupted by conversion to UTF-8
+ Running catman makes man display junk data
description: updated
Revision history for this message
ooze (zoe-gauthier) wrote :

If I run `sudo man -L fr_CA.utf8 -caM /usr/share/man 6 nethack' from the command line, no cache file is created and so the bug does not happen. If I create a small script that runs the same command, but clears the environment, I get a junk man page. Here is a such a script:

#!/usr/bin/python
import os
os.execve("/usr/bin/man", ["man", "-L", "fr_CA.utf8", "-caM", "/usr/share/man", "6", "nethack"], {})

If I restore the environment variable about the locale, a cat file is created, but I do not get the bug:

#!/usr/bin/python
import os
os.execve("/usr/bin/man", ["man", "-L", "fr_CA.utf8", "-caM", "/usr/share/man", "6", "nethack"], {'LANG': 'fr_CA.utf8'})

So a possible solution would be for catman to add a LANG environment variable to the execve call to man. I don't know enough about the usage cases of catman; is the locale supposed to be valid? Is it good enough to be used for system-wide purposes? Otherwise, should catman fake an utf-8 locale?

Revision history for this message
Colin Watson (cjwatson) wrote :

Could you please add the -d option to man and post the *full* output, so that I can trace through what's happening?

Revision history for this message
Colin Watson (cjwatson) wrote :

BTW, the locale difference is interesting, but any fix based on tweaking it is likely to be incorrect. I'd rather find the underlying cause.

Revision history for this message
Colin Watson (cjwatson) wrote :

Ah, never mind, I just managed to reproduce it myself ...

Changed in man-db (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → High
assignee: nobody → Colin Watson (cjwatson)
Revision history for this message
Colin Watson (cjwatson) wrote :

Fixed upstream. I intend to backport this to Debian and Ubuntu.

Tue Aug 17 14:29:53 BST 2010 Colin Watson <email address hidden>

        * src/man.c (display_catman): Add iconv to format_cmd before adding
          a compressor.
        * NEWS: Document this.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package man-db - 2.5.7-4

---------------
man-db (2.5.7-4) unstable; urgency=low

  * Backport from trunk:
    - Fix a regression introduced in 2.5.7 when running catman in some
      locales, most notably in the C locale: while converting the output to
      UTF-8, iconv was run after the compressor rather than before it
      (closes: #593350, LP: #615045).
 -- Colin Watson <email address hidden> Tue, 17 Aug 2010 15:02:31 +0100

Changed in man-db (Ubuntu):
status: Triaged → Fix Released
Colin Watson (cjwatson)
Changed in man-db (Ubuntu Lucid):
status: New → Triaged
importance: Undecided → High
assignee: nobody → Colin Watson (cjwatson)
Revision history for this message
Kenny Root (kennyr) wrote :

Is this going to be put into Lucid?

As a workaround, you can do something like: rm -f /var/cache/cat*/*.gz

Revision history for this message
Colin Watson (cjwatson) wrote :

Yes, I ought to. Milestoning for 10.04.2 so I don't forget.

Changed in man-db (Ubuntu Lucid):
milestone: none → ubuntu-10.04.2
Colin Watson (cjwatson)
description: updated
Colin Watson (cjwatson)
Changed in man-db (Ubuntu Lucid):
status: Triaged → In Progress
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted man-db into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in man-db (Ubuntu Lucid):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

SRU verification for Lucid:
I have reproduced the problem with man-db 2.5.7-2 in lucid and have verified that the version of man-db 2.5.7-2ubuntu1 in -proposed fixes the issue.

Marking as verification-done

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package man-db - 2.5.7-2ubuntu1

---------------
man-db (2.5.7-2ubuntu1) lucid-proposed; urgency=low

  * Backport from trunk:
    - Fix a regression introduced in 2.5.7 when running catman in some
      locales, most notably in the C locale: while converting the output to
      UTF-8, iconv was run after the compressor rather than before it
      (closes: #593350, LP: #615045).
 -- Colin Watson <email address hidden> Thu, 30 Sep 2010 13:55:29 +0100

Changed in man-db (Ubuntu Lucid):
status: Fix Committed → Fix Released
Changed in man-db (Debian):
status: Unknown → Fix Released
tags: added: testcase
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.