Image segmentation not working

Asked by warpitaly on 2010-03-26

I'm using cuneiform 0.9 on Ubuntu.

I'm processing several invoices, and (too) often tabular parts are processed as images and ignored by the OCR.
Is it normal?

Furthermore, the images are referred in the hocr but their bounding boxes are always set to zero.

1) is there a way to get a less greedy image/text splitter?
2) is there a way to get the right coordinates in the hocr, in order to allow a recursive processing?


Question information

English Edit question
Cuneiform for Linux Edit question
No assignee Edit question
Solved by:
Yury V. Zaytsev
Last query:
Last reply:
Yury V. Zaytsev (zyv) said : #1

1) Unfortunately, there's no way to control it ATM. I think you can disable it altogether, but I am not sure whether it will then try to recognize everything or just dismiss images.

2) If this is reproducible on latest trunk, please file a bug, because it shouldn't normally happen.

warpitaly (giorgio-davanzo) said : #2

1) any hints on how to disable it?
2) I filed the bug, thanks!

warpitaly (giorgio-davanzo) said : #4

Thanks Yury V. Zaytsev, that solved my question.

warpitaly (giorgio-davanzo) said : #5

Just a small note: the patch does not works anymore; however, I simply manually changed the file and everything worked out fine!
Thanks a lot!!!

warpitaly (giorgio-davanzo) said : #6

A further note: the solution solved 1), not 2) --- hence, I'm still leaving the bug open...