TextRecognizer.doOCR() not giving Expected output

Asked by Sahil Doshi

I try to perform an OCR scan on This image - https://opshubtrial-my.sharepoint.com/:i:/g/personal/sahil_doshi_opshub_com/Ec6goWFEMkxBjN0508tsucwBeUljq9XQ_hZMu2PpxYDmyA?e=K9ntP7

using TextRecognizer.doOCR();

here is output:

OIadmowledgemdageewi?lthedmve.
©1donotadmowledgemdageewi?n?nedmve.

Expected Output:

I acknowledge and agree with the above.
I do not acknowledge and agree with the above.

can anyone help me with this?

Question information

Language:
English Edit question
Status:
Answered
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
RaiMan (raimund-hocke) said :
#1

I tested with your screen shot (on Mac):
it is not read correctly.

The reason might be, that the font is a bit too small.

Tests with slightly larger fonts work better too correct.

Before giving the image to OCR, it is resized to the optimum of about 300 DPI.
IMHO greyscaling is not needed, since this is done inside Tesseract anyways.

Revision history for this message
Mike (maestro+++) said :
#2

If you are using OCR on a web page, how do you know when you've got 300 DPI?

Revision history for this message
RaiMan (raimund-hocke) said :
#3

this is only related to the screen resolution (screen pixels are the only stuff, SikuliX knows about)

Can you help with this problem?

Provide an answer of your own, or ask Sahil Doshi for more information if necessary.

To post a message you must log in.