SikuliX

TextRecognizer.doOCR() not giving Expected output

Asked by Sahil Doshi on 2019-04-01

I try to perform an OCR scan on This image - https://opshubtrial-my.sharepoint.com/:i:/g/personal/sahil_doshi_opshub_com/Ec6goWFEMkxBjN0508tsucwBeUljq9XQ_hZMu2PpxYDmyA?e=K9ntP7

using TextRecognizer.doOCR();

here is output:

OIadmowledgemdageewi?lthedmve.
©1donotadmowledgemdageewi?n?nedmve.

Expected Output:

I acknowledge and agree with the above.
I do not acknowledge and agree with the above.

can anyone help me with this?

Question information

Language:: English Edit question

Status:: Answered

For:: SikuliX Edit question

Assignee:: No assignee Edit question

Last query:: 2019-04-01

Last reply:: 2019-04-01

Link existing bug

Revision history for this message

RaiMan (raimund-hocke) said on 2019-04-01:

I tested with your screen shot (on Mac):
it is not read correctly.

The reason might be, that the font is a bit too small.

Tests with slightly larger fonts work better too correct.

Before giving the image to OCR, it is resized to the optimum of about 300 DPI.
IMHO greyscaling is not needed, since this is done inside Tesseract anyways.

Revision history for this message

Mike (maestro+++) said on 2019-04-01:

If you are using OCR on a web page, how do you know when you've got 300 DPI?

Revision history for this message

RaiMan (raimund-hocke) said on 2019-04-01:

this is only related to the screen resolution (screen pixels are the only stuff, SikuliX knows about)

Can you help with this problem?

Provide an answer of your own, or ask Sahil Doshi for more information if necessary.

To post a message you must log in.

Ask a question

Edit question

SikuliX

TextRecognizer.doOCR() not giving Expected output

Question information

Related bugs

Related FAQ:

Can you help with this problem?

Subscribers