TextRecognizer.doOCR() not giving Expected output

Asked by Sahil Doshi on 2019-04-01

I try to perform an OCR scan on This image - https://opshubtrial-my.sharepoint.com/:i:/g/personal/sahil_doshi_opshub_com/Ec6goWFEMkxBjN0508tsucwBeUljq9XQ_hZMu2PpxYDmyA?e=K9ntP7

using TextRecognizer.doOCR();

here is output:

OIadmowledgemdageewi?lthedmve.
©1donotadmowledgemdageewi?n?nedmve.

Expected Output:

I acknowledge and agree with the above.
I do not acknowledge and agree with the above.

can anyone help me with this?

Question information

Language:
English Edit question
Status:
Answered
For:
Sikuli Edit question
Assignee:
No assignee Edit question
Last query:
2019-04-01
Last reply:
2019-04-01
RaiMan (raimund-hocke) said : #1

I tested with your screen shot (on Mac):
it is not read correctly.

The reason might be, that the font is a bit too small.

Tests with slightly larger fonts work better too correct.

Before giving the image to OCR, it is resized to the optimum of about 300 DPI.
IMHO greyscaling is not needed, since this is done inside Tesseract anyways.

Mike (maestro+++) said : #2

If you are using OCR on a web page, how do you know when you've got 300 DPI?

RaiMan (raimund-hocke) said : #3

this is only related to the screen resolution (screen pixels are the only stuff, SikuliX knows about)

Can you help with this problem?

Provide an answer of your own, or ask Sahil Doshi for more information if necessary.

To post a message you must log in.