Ubuntu
openoffice.org package

how to use word processor and scanned document image

Asked by wayne bickford on 2012-02-08

How do I type on a scanned document with the word processor.

I did a simple scan from applications-graphics.

Then copied the scanned document on to the desktop, opened office word processor and dragged image to word processor.

I am unable to type on the document in word processor.

Is there another way to be able to type on a scanned document.

Question information

Language:: English Edit question

Status:: Answered

For:: Ubuntu openoffice.org Edit question

Assignee:: No assignee Edit question

Last query:: 2012-02-08

Last reply:: 2012-02-09

Link existing bug

Revision history for this message

mycae (mycae) said on 2012-02-09:

Short answer is this will not work the way you are trying to do this. Its possible, but likely manual re-entry is faster for a one-off.

I recommend using an image editing program, like gimp, if you really need to edit the appearance of the document. If you just want the text, it is likely that it will be faster to retype it than to process it with a scanner. Automatic text recognition *always* needs someone to read it to correct errors anyway.

If this is not just a one off, and you have hundreds of these to do, then you might want to look at systems to do this. I discuss these below.

Let me know if you need more info

---
Long answer:

Image data and text data are not the same thing. An image is a grid of colour and brightness values (Sorta). Text is a string of characters.

Unfortunately what you are trying to do is a complicated thing from the point of view of a computer system. It almost never works with handwriting, and is only passable with a well presented font, with simple text layout.

Automatic character recognition (also known as Optical Character Recognition (OCR)) is an open computing problem, with many people around the world attempting to provide a functional solution. Other systems rely on the fact that it is a hard to solve problem (Those fuzzy number "captcha" things you may hate to type from time to time when filling out online forms)

The current state of OCR under linux is that it is aimed at advanced users who can pre-process the documents to a level where it is easy for the computer to understand (rotating the image to vertical, removing crease marks or noise in the image)

The tool of choice (in my opinion) is a "console" (text based command interface) tool designed for doing large batch processing, known as "tesseract". There exist a few front ends (ocrfeeder), but I have had little luck with them myself, and used the "tesseract" tool directly, to obtain better results.

https://help.ubuntu.com/community/OCR

Wikipedia on OCR
https://secure.wikimedia.org/wikipedia/en/wiki/Optical_character_recognition

Short answer is this will not work the way you are trying to do this. Its possible, but likely manual re-entry is faster for a one-off.

If this is not just a one off, and you have hundreds of these to do, then you might want to look at systems to do this. I discuss these below.

Let me know if you need more info

---
Long answer:

Image data and text data are not the same thing.  An image is a grid of colour and brightness values (Sorta). Text is  a string of characters.

Automatic character recognition (also known as Optical Character Recognition (OCR)) is an open computing problem, with many people around the world attempting to provide a functional solution. Other systems rely on the fact that it is a hard  to solve problem (Those fuzzy number "captcha" things you may hate to type from time to time when filling out online forms)

The tool of choice (in my opinion) is a "console" (text based command interface)  tool designed for doing large batch processing, known as "tesseract". There exist a few front ends (ocrfeeder), but I have had little luck with them myself, and used the "tesseract" tool directly, to obtain better results.

https://help.ubuntu.com/community/OCR

Wikipedia on OCR
https://secure.wikimedia.org/wikipedia/en/wiki/Optical_character_recognition

Revision history for this message

Tony Pursell (ajpursell) said on 2012-02-09:

Hi Wayne

I have used OCRFeeder (it in Software Centre). As it uses OCR technology, it has all the drawbacks that mycae refers to. It does, however, have one good feature. It can output the text produced from the OCR process as a ODT file compatible with OpenOffice.org.

It is quick to install, so why not give it a try.

Tony

Can you help with this problem?

Provide an answer of your own, or ask wayne bickford for more information if necessary.

To post a message you must log in.

Ask a question

Edit question

Ubuntuopenoffice.org package

how to use word processor and scanned document image

Question information

Related bugs

Related FAQ:

Can you help with this problem?

Subscribers

Ubuntu
openoffice.org package