Image segmentation not working

Asked by warpitaly on 2010-03-26

Greetings,
I'm using cuneiform 0.9 on Ubuntu.

I'm processing several invoices, and (too) often tabular parts are processed as images and ignored by the OCR.
Is it normal?

Furthermore, the images are referred in the hocr but their bounding boxes are always set to zero.

Hence:
1) is there a way to get a less greedy image/text splitter?
2) is there a way to get the right coordinates in the hocr, in order to allow a recursive processing?

Thanks!

Question information

Language:
English Edit question
Status:
Solved
For:
Cuneiform for Linux Edit question
Assignee:
No assignee Edit question
Solved by:
Yury V. Zaytsev
Solved:
2010-03-27
Last query:
2010-03-27
Last reply:
2010-03-26
Yury V. Zaytsev (zyv) said : #1

1) Unfortunately, there's no way to control it ATM. I think you can disable it altogether, but I am not sure whether it will then try to recognize everything or just dismiss images.

2) If this is reproducible on latest trunk, please file a bug, because it shouldn't normally happen.

warpitaly (giorgio-davanzo) said : #2

1) any hints on how to disable it?
2) I filed the bug, thanks!

warpitaly (giorgio-davanzo) said : #4

Thanks Yury V. Zaytsev, that solved my question.

warpitaly (giorgio-davanzo) said : #5

Just a small note: the patch does not works anymore; however, I simply manually changed the file and everything worked out fine!
Thanks a lot!!!

warpitaly (giorgio-davanzo) said : #6

A further note: the solution solved 1), not 2) --- hence, I'm still leaving the bug open...