Image segmentation not working

Asked by warpitaly

Greetings,
I'm using cuneiform 0.9 on Ubuntu.

I'm processing several invoices, and (too) often tabular parts are processed as images and ignored by the OCR.
Is it normal?

Furthermore, the images are referred in the hocr but their bounding boxes are always set to zero.

Hence:
1) is there a way to get a less greedy image/text splitter?
2) is there a way to get the right coordinates in the hocr, in order to allow a recursive processing?

Thanks!

Question information

Language:
English Edit question
Status:
Solved
For:
Cuneiform for Linux Edit question
Assignee:
No assignee Edit question
Solved by:
Yury V. Zaytsev
Solved:
Last query:
Last reply:
Revision history for this message
Yury V. Zaytsev (zyv) said :
#1

1) Unfortunately, there's no way to control it ATM. I think you can disable it altogether, but I am not sure whether it will then try to recognize everything or just dismiss images.

2) If this is reproducible on latest trunk, please file a bug, because it shouldn't normally happen.

Revision history for this message
warpitaly (giorgio-davanzo) said :
#2

1) any hints on how to disable it?
2) I filed the bug, thanks!

Revision history for this message
Best Yury V. Zaytsev (zyv) said :
#3
Revision history for this message
warpitaly (giorgio-davanzo) said :
#4

Thanks Yury V. Zaytsev, that solved my question.

Revision history for this message
warpitaly (giorgio-davanzo) said :
#5

Just a small note: the patch does not works anymore; however, I simply manually changed the file and everything worked out fine!
Thanks a lot!!!

Revision history for this message
warpitaly (giorgio-davanzo) said :
#6

A further note: the solution solved 1), not 2) --- hence, I'm still leaving the bug open...