xsane2tess problem
The trouble in post 136345 is still here. Xsane works fine by itself. It works with gocr to do OCR but gocr doesn't give good results, tesseract is much better. But xsane2tess doesn't work with me. The problem seems to be that the .tif file produced by imagemagick isn't recognised by tesseract. Image Viewer doesn't recognise it, either, though Document Viewer and Gimp do. The scanner is HP Photosmart C3180 all-in-one and it produces a PGM image file. On the scan I've been using the file is 2.0MB and the TIF file is 22MB. Perhaps one of the options available in imagemagick will produce a recognisable file but which? I've tried removing the '-compress none' option to no avail. Thanks for your help.
Question information
- Language:
- English Edit question
- Status:
- Answered
- For:
- Ubuntu xsane Edit question
- Assignee:
- No assignee Edit question
- Last query:
- 2010-12-21
- Last reply:
- 2010-12-21
Richard Wilmot (richardglobal) said : | #1 |
The image type of the .tif image given under 'properties' is PNM if this helps.
mycae (mycae) said : | #2 |
Hello again,
PNM is not a tiff format, it is a Portable aNyMap file, which is different:
https:/
Imagemagick is not writing a tiff in this case. Gimp is probably recognizing the file from its contents and thus it can load it anyway.
I suspect that there is something wrong with the arguments at this line:
# converting image into TIFF (ImageMagick)
convert "$FILE_PATH" -compress none "$TIF_FILE" 1>&2
Can you provide the shell trace? (bash -x ./xsane2tess ; or wherever xsane2tess is located). This should show the arguments provided. You can then check the type of $FILE_PATH and $TIF_FILE afterwards.
For example
convert tmp.jpg tmp.tif
then
file tmp.tif
outputs
tmp.tif: TIFF image data, little-endian
Richard Wilmot (richardglobal) said : | #3 |
Hi mycae
Thanks for your continued help. On starting up this morning I tried xsane2tess and it worked! But only once - now we're back to normal.
If I use a jpeg file as the starting point, convert works normally, producing a .tif file of about the same size as the original and tesseract converts this (and Image Viewer likes it, too). If I use the file produced from the scanner via xsane I get (I've shortened the extremely long filename!):
convert xsaneorig.
convert: unable to open image `xsaneorig.
convert: no decode delegate for this image format `xsaneorig.
It looks as if xsane is producing a file format that imagemagick doesn't like.
The once when xsane2tess worked, the .tif file was 253kB, about the same length as the original. When it doesn't it's nearly 100 times longer (22.6MB). One thing - the original file should be about 250kB (about 2M pixel x 1 bit/pixel) but is 2MB on the disc.
mycae (mycae) said : | #4 |
try renaming the file to xsaneorig.ppm, rather than xsaneorig.
you can take the single file aside, and try just running the imagemagick commands yourself, and check the output.
Richard Wilmot (richardglobal) said : | #5 |
Tried it: exactly the same result.
convert: unable to open image `xsaneorig.ppm': @ error/blob.
Richard Wilmot (richardglobal) said : | #6 |
As it was a PGM file I changed the extension to .pgm and tried again but the same result.
mycae (mycae) said : | #7 |
Can you upload that file?
Richard Wilmot (richardglobal) said : | #8 |
Certainly. I assume you mean the image file. Where do I upload it to? As
an attachment to an email like this or what (I'm new to this!). Thanks
for all your help.
On 18/12/10 18:43, mycae wrote:
> Your question #138107 on xsane in ubuntu changed:
> https:/
>
> Status: Open => Answered
>
> mycae proposed the following answer:
> Can you upload that file?
>
Richard Wilmot (richardglobal) said : | #9 |
If using the image file with xsane2tess within xsane - same result, huge .tif file not recognised by tesseract. If I copy it into my home folder and run the file:
convert /home/richard/
tesseract -i /home/richard/
then it converts to a proper .tif file (correct size) but produces the result:
+ convert /home/richard/
+ tesseract -i /home/richard/
read_variables_
Adding the '-compress none' makes no difference (even to the size of the .tif file). Under 'properties' the raw file is given as PGM file and under 'properties - image' the .tif file says 'failed to load image information'. Am I doing something really silly here?
mycae (mycae) said : | #10 |
Well, the tesseract command looks incorrect.
Looking at the manpage for tesseract (man tesseract -- synopsis section), the correct usage would be
tesseract /home/richard/
(Note the -i and -o are not correct, and should be dropped).
Thanks - very stupid of me! When I run my now corrected test file on the
image file it works fine, but still xsane2tess doesn't work in xsane. In
the xsane2tess code what's the 1>&2 bit at the end of the commands for?
mycae (mycae) said : | #12 |
When programs run in a terminal, they can generate two kinds of output -- "standard output" and "standard error" (stdout and stderr).
You can redirect the output to stdout doing this:
echo "hello world" > file.txt
instead of spitting out "hello world" to the console, this will be saved to file.txt.
But if the program spat out errors (what is an error is up to the author of the program - they can use either stream at will), then you will not capture these using the above, and the errors will be written to the terminal.
You can capture errors using the 2> notation (pretend error is a fictional program that spits out an error message)
error "some error occurd" 2> errors.txt
in this case the error would be saved, but normal text would be printed to the screen.
The 1>&2 combines the stdout and stderr into one single stream. Redirecting using 1> would just redirect standard output to a file, not error messages.
http://
Thanks. I'm not used to these scripts so only vaguely understand these details. It gets us no nearer finding out why imagemagick and tesseract work OK on the file outside xsane2tess, but not inside it. I suppose I'll never know and can only use tesseract in OCRfeeder. The HP software that came with the printer included a version of the IRIS OCR software and it works very well indeed (even better than tesseract!) but, of course, there's no Linux version. Ho hum!! Thanks again, mycae, for all your very patient help. Unless you have a great inspiration, please don't waste any more time on this.
Can you help with this problem?
Provide an answer of your own, or ask Richard Wilmot for more information if necessary.