SikuliX

Text OCR related question

Asked by Subhash on 2015-07-17

HI,

I am using Sikuli for scraping text fron screen.
I am on version 1.0 rc3

I am having good success so far in terms of recognition itself.

However, the use case I have is to specifically identify text that ends with a colon. This will help me specifically tag text with greater accuracy and I need to find text on either side of colon (:)

I am using Region.text() the get me all text tokens from a screen region.

However I see that during the OCR process, it specically filters out the colon from the image.
I can see this from the intermediary files created (xxx-lineblobs.vlog.png has the colon and then the subsequent processed output xxx-lineblobs-filtered.vlog.png has it removed)

This is very critical in my processing step.
Is there any way I can configure for this to be considered as another character and be retained in the OCR results.

Regards
Subhash

Question information

Language:: English Edit question

Status:: Answered

For:: SikuliX Edit question

Assignee:: No assignee Edit question

Last query:: 2015-07-17

Last reply:: 2015-07-17

Link existing bug

Revision history for this message

RaiMan (raimund-hocke) said on 2015-07-17:

sorry, no way.

You would have to adapt the C++ source code and then build the stuff from the sources.

Another option is to use Tesseract from command line:
- create an image file from the respective region using capture
- run the Tesseract command from inside the script using e.g. subprocess or Java RunTime.exec
- get the output of the Tesseract command (e.g. stdout)
With this you can use any option Tesseract 3 is offering.

Greater improvements of the SikuliX OCR feature will only be contained in version 2 beginning somewhen in 2016.

Revision history for this message

Subhash (subhash-bylaiah) said on 2015-07-21:

Thanks much, Raimund

Can you help with this problem?

Provide an answer of your own, or ask Subhash for more information if necessary.

To post a message you must log in.

Ask a question

Edit question

SikuliX

Text OCR related question

Question information

Related bugs

Related FAQ:

Can you help with this problem?

Subscribers