text recognition errors with some fonts

Bug #777660 reported by janet
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
SikuliX
New
Undecided
Unassigned

Bug Description

Our app has a default set to use Tahoma for certain display elements to guarantee that japanese and chinese characters are properly rendered on the app.

When I use sikuli on Windows (rc2) text recognition with the font set to Tahoma (11).
It fails to find certain text, mainly letter combinations involving an 'l'.

Examples include: ClearBits, and Playlists. I assume it's because the character spacing in so close together.

To reproduce:
 1. Launch Miro (nightly or beta build from http://nightlies.pculture.org)
 2. Launch Sikuli IDE
 3. Try find("ClearBits") or click("Playlists")
 Result: recognition fails - even if searching in a very limited region.
 4 Try find("Bits") or 'lists'
 Result: recognition successful.

Additional info - I had similar text recognition issues using the default Ubuntu font on linux maverick, however I was able to workaround by changing the system font preferences to something more machine readable.

Tags: text
Revision history for this message
RaiMan (raimund-hocke) wrote :

see consolidated bug 710586

Revision history for this message
anatoly techtonik (techtonik) wrote :

If I know the exact font used, is it somehow possible to select this font?

I understand that Tesseract is used as underlying library. Is there a way to communicate font parameters to it? If I understand correctly, it will also require some data files with analysis of specified font and mapping to character set.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.