[HowTo] poor man's OCR with ImageMagick and tesseract

Asked by RaiMan

This is not really a question, but written down, with the intention, that maybe someone else needs it.

I'm running primarily Mac OS X 10.6 with Sikuli 10.2. (additionally Win7 32 as bootcamp on a MacBookPro)

Since my first contact with Sikuli I'm looking forward to get some OCR features integrated into Sikuli. But that's not reality until now.
Now I found a sufficient solution for me, that helps me over the time until Sikuli itself contains a Region.getText() ;-)

Generally looking for some free OCR app for Mac, I found, that most solutions are based on tesseract-ocr (http://code.google.com/p/tesseract-ocr/) a free OCR solution that in its standard expects plain tif as input and is able to produce readable text as output in different languages.

on command line:
tesseract input.tif output
which leaves you with an output.txt containing the recognized text.

Since Sikuli is based on png-files, you have to convert these images to tif before they can be used with tesseract. On top I read in the web, that tesseract delivers best results, if the input is in grayscale. My own tests brought up, that the images at least should have a resolution of 300dpi, to get acceptable results.

Since in the first place I did not want to invest in Python and/or Java programming (where it would have been possible to use the openCV, thats available in Sikuli), I decided to use ImageMagick () for the conversion process:

on commandline:
convert input.png -resample 300 -colorspace Gray output.tif
so output.tif would be the input to tesseract.

Since at least two additional files are produced, I decided to use the Sikuli temp directory and rely on the fact, that during quit of Sikuli at least the temp png's are deleted.

People out there, who live with shell scripting night and day: pls. don't LOL ;-) it works :-)))
this is my script, that I call from inside of a Sikuli script:
#!/bin/sh
export PATH=/opt/local/bin:/opt/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin
convert $1 -resample 300 -colorspace Gray $1.tif
tesseract $1.tif $1 2>/dev/null
rm $1.tif
mv $1.txt $1
cat $1

Mac scripters: I know that its possible, to set the default path for new processes somewhere else.
the cat prints the text to stdout, so I get it back to Sikuli.
after the script ends, temp only has the original .png, which now contains the text instead and will be deleted by Sikuli

note: no parameter yet to select the OCR language or the resolution

Now in Sikuli I have my OCR feature in a def():
def myOCR(reg = None, debug=False):
 import os
 if not reg: i = capture()
 else:
  try:
   i = capture(reg.getX(), reg.getY(), reg.getW(), reg.getH())
  except:
   i = None
 if not i: exit(1)
 f = os.popen("absolute-path-to-the-above-shell-script " + i)
 lines = f.readlines()
 f.close()
 if debug:
  for x in lines: print x[:-1]
 return lines

2 possible uses:
--- based on a given region:
m = find(path-to-image-containing-text)
text = myOCR(m)

--- the user selects the region (only useful for some pre tests)
text = myOCR()
currently, you will not know, what part of the screen is selected

setting debug to True, will additionally print the text to the message area.

--- the return value is a list of lines, that still contain a \n at the end. empty lines (only \n) may be there.
a line may contain non printable characters (I did not yet analyse this situation, but may have to do with character sets (UTF, unicode, ...)).

--- so what do you have to do, if you want to use it:
- get ImageMagick and tesseract-ocr running (I used macports and succeeded at once).
- adapt the shell script to your environment
- adapt the Sikuli def() to your situation (at least path to shell script)
- make your experiences

--- my first experiences
- even rather small text is recognized with a high rate
- the processing time is rather short (less than 0.2 seconds for 1 or 2 lines of text, which is near the average of an optimized find())
- too much grafics and too many different fonts in the region may lower the recognition-rate
- large regions take some time and may return rubbish (somewhere above 600 x 600) (I will not analyse it, since I doubt that this makes sense in a Sikuli script at all)

I have working examples (or just tested with myOCR()):
- read the title of the frontmost app window
- get the name of the frontmost app
- read the tab titles of all tabs in a browser window
- get the names of all running apps from the task window
- read the text from text boxes in pref panes
- read the text of buttons
- read text in pictures in iPhoto
- ...

If you do it, pls. talk about your experiences.

Question information

Language:
English Edit question
Status:
Expired
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
niknah (hankin0) said :
#1

That's cool, it seems easy to use. The blog has another way of doing it, useful if your language is not in tesseract...

http://sikuli.org/blog/2010/05/06/extract-text-with-clipboard/

You could do with a wiki or somewhere to put these HOWTOs. It'll get lost in here after a while.

Revision history for this message
RaiMan (raimund-hocke) said :
#2

There are many situations, where the clipboard-cut-and-paste solution does not work (window titles, button texts, text in pictures, menu entries, lable text in dialog boxes, ...).

Even with text, that can be selected with the mouse, the tesseract solution may be easier to handle, since you don't need to evaluate the start and end of the drag.

I will put it in the Sikuli wiki, so that it can be referenced in the HowTo section of the Sikuli home page.

Revision history for this message
Launchpad Janitor (janitor) said :
#3

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

Revision history for this message
oganer@gmail.com (oganer) said :
#4

Thanks! You workaround very useful for me! Best wishes from Russia.