Font not reconized

Asked by Carlos Garcia

Please, I would like to get this image [http://puu.sh/5Li7v.png] recognized by SIKULI through JAVA API. At this moment the zero characters are not recognized. Best regards. Carlos

Question information

Language:
English Edit question
Status:
Solved
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Solved by:
Carlos Garcia
Solved:
Last query:
Last reply:
Revision history for this message
RaiMan (raimund-hocke) said :
#1

If they are not recognised correctly (supposing you are using the latex version of SikuliX), then there is no chance with Sikuli's Region.text().

You might check wether it gets better, when you read only on of the cells.

Revision history for this message
Eugene S (shragovich) said :
#2

Hi,

Text recognition is not always accurate. It is a known issue.
However, if it's just short number strings, I have found that creating a kind of "translation table" can be very useful in some cases.

First of all, as RaiMan suggested here, try to enclose the text in a region (that will include that text only).
Then run a couple of tests and see if your results are consistent. Probably you will notice that some numbers are recognized incorrectly. Common mistakes might look like:

recognizing "U" instead of "0"
recognizing "?" instead of "7"

or more bizarre cases, like:
recognizing "CI" instead of "0"
recognizing "C|" instead of "0"
recognizing "EI" instead of "0"

or even:
recognizing "l3" instead of "0"

and:
recognizing "Ei" instead of "6"

So if you find such common pairs, you can create some post-processing rules. So, for example, if you have your recognized text saved under "text" variable, you can do something like that:

text = text.replace('CI', '0')
text = text.replace('Ei', '6')

It can make all the difference you need as long as those recognition mistakes are consistent.

Eugene

Revision history for this message
Christopher Golda (christophergolda) said :
#3

Hi Carlos,

I hope this doesn't sound too ridiculous, but if the image you are trying to capture is on software capable of zooming in (such as text in a browser), you might want to try that first and then use the OCR (for the enlarged region). You can use something like type("+", KeyModifier.CTRL) in your script a couple of times and then zoom back out.

Just a little while ago, I was desperately trying to get the text method to pull this date / time stamp off of a dynamically generated chart with absolutely no luck. Once I zoomed in a couple times, it was a night and day difference, recognizing the entire string flawlessly and reliably. Ultimately, I ended up creating a 'zoomInOrOut' Sikuli function (along with passed in zoom frequency) that I call before and after I need to recognize a difficult string of text. Obviously, this isn't a very elegant solution, but it might be worth giving it a shot if nothing else is working.

This coupled with really narrowing the region down to the text has worked wonders for me. I can now happily say that Sikuli does everything that I needed and hoped for, though I will certainly be awaiting the enhanced OCR in subsequent versions.

Hope this helps.

Sincerely

Chris

Revision history for this message
Christopher Golda (christophergolda) said :
#4

assuming the problem is related more to size than font type, that is...good luck.

Revision history for this message
Carlos Garcia (calgarcia) said :
#5

Hello Christopher, thanks very much for your help.

Now I can get the image reconized almost entirelly.

Best regards. Carlos

Revision history for this message
Christopher Golda (christophergolda) said :
#6

Hi Carlos,

I'm really glad to hear that worked out for you.

Generally, this strategy works well but if you end up running into any other problems, you can also try using python to remove the date separators and then do a quick text substitution like Eugene mentioned above. Usually, I can get away with making a few basic assumptions (such as an erroneous letter O is probably a 0 and an erroneous letter l is probably a number 1).

Using the three steps mentioned above (zoom, remove separators, and a best guess string replacement strategy,) you will be covered for the most part.

Take care and have a good night.

Sincerely,

Chris