Xsane and Tesseract OCR

Asked by David Boyd on 2009-09-08

I have downloaded and installed Tesseract 2.04 OCR. It works well with gscan2pdf but I can't get it to work with Xsane. What OCR command do I need to type into Xsane Setup? Do I need to download anything else to make it work. Please explain in simple terms how to get these two programmes to work together so I can scan with Xsane and then convert to text with Tesseract OCR. I'm using Ubuntu 9.04 Jaunty with an AMD 64 dual core processor.

Question information

Language:
English Edit question
Status:
Answered
For:
Ubuntu tesseract Edit question
Assignee:
No assignee Edit question
Last query:
2009-09-11
Last reply:
2009-09-12
Larry Jordan (larryjor) said : #1

     Personally, I never had that much trouble with gocr, though I don't use it much. Would be nice to hear back from you as to how (much better?) this works once you have it.
     I found a page on using it with xsane at http://ubuntuforums.org/showthread.php?p=4304463 that suggests you may need an additional program to use it. It gives a link to set up with.
     With gocr, the lines are:

OCR Command: gocr
Inputfile option: -i
Outputfile option: -o

     Options for gocr state that -i (file) reads input from (file) and -o (file) sends output to (file) instead of stdout. As for what the command line is to use with Tesseract, I don't have it yet and don't see documentation. I do see some references to a man page (try 'man tesseract' in a terminal and see if you have it).

      Again, please let us know how well it works out for you once you get it set up.

Larry Jordan (larryjor) said : #2

     Personally, I never had that much trouble with gocr, though I don't use it much. Would be nice to hear back from you as to how (much better?) this works once you have it.
     I found a page on using it with xsane at http://ubuntuforums.org/showthread.php?p=4304463 that suggests you may need an additional program to use it. It gives a link to set up with.
     With gocr, the lines are:

OCR Command: gocr
Inputfile option: -i
Outputfile option: -o

     Options for gocr state that -i (file) reads input from (file) and -o (file) sends output to (file) instead of stdout. As for what the command line is to use with Tesseract, I don't have it yet and don't see documentation. I do see some references to a man page (try 'man tesseract' in a terminal and see if you have it).

      Again, please let us know how well it works out for you once you get it set up.

Larry Jordan (larryjor) said : #3

     Sorry, additional program you need is called xsane2tess.... still, it is referenced in the forum post.

David Boyd (daboyd) said : #4

Thanks Larry. I looked up the page http://doc.ubuntu-fr.org/xsane2tess and as I can't read French don't understand how to get the scripts mentioned for xsane2tess. If anyone can translate, again in simple terms, please let me know.

Larry Jordan (larryjor) said : #5

    Oh, sorry; didn't realize it would be in French at that location and thought you might look for it (web search). I found a better location (in English) after doing a search for you; maybe this will solve it more correctly:

http://aur.archlinux.org/packages.php?ID=24702

    There is a link for it at the bottom - looks like it is a script in Bash that they are using, which might make it fairly straightforward to correct (if necessary).

David Boyd (daboyd) said : #6

OK Larry. I went to archlinux.org/packages, downloaded xane2tess tarball. Unpacked it. Now what do I do?

Larry Jordan (larryjor) said : #7

     Looks as though it is set up to work much the same as gocr.. You can test it by using 'xsane2tess -i (input file in graphic format) -o (whatever file name you want for the text file). You should be able to set it up in xsane with OCR Command as xsane2tess and just the -i -o in the other fields.

Can you help with this problem?

Provide an answer of your own, or ask David Boyd for more information if necessary.

To post a message you must log in.