Question: Tiff to hOCR conversion on an average takes about 1 1/2 seconds on Linux platform

Asked by Raghu

I need to some clarification regarding time needed for creating an hOCR file from a tiff image.

I am running cuneiform on Linux platform. When I try to create a hOCR file from a tiff image, on an average, the program takes about 1 1/2 seconds.

My first question is that whether it is normal for OCR conversion to take about 1 1/2 seconds?

Is there a way I can improve the performance of cunieform?

We have a need to convert a large amount of tiff documents to hOCR format in real time.

Thanks,
Raghu

Question information

Language:
English Edit question
Status:
Solved
For:
Cuneiform for Linux Edit question
Assignee:
No assignee Edit question
Solved by:
Yury V. Zaytsev
Solved:
Last query:
Last reply:
Revision history for this message
Yury V. Zaytsev (zyv) said :
#1

1 1/2 = 1 minute 30 seconds? That's pretty weird but not totally unreasonable.

1) Are you able to provide us with a sample image?
2) Did you try to convert it to BMP to see whether it works quicker or not?

Revision history for this message
Raghu (rsudupa) said :
#2

Thanks Yury for your quick response.

No. I mean 1500 mili seconds (1.5 seconds).

We are receiving g3/g4 format fax files. We convert them to tiff format first. I am using the tiff format.

I can send you the tiff image. But I do not see an option for attaching files in launchpad.

Thanks,
Raghu

Revision history for this message
Best Yury V. Zaytsev (zyv) said :
#3

1) I know you need TIFF. The reason I am asking to compare TIFF vs BMP is that we need to determine whether this delay is caused by ImageMagick pre-processing (TIFF -> BMP) or it's the OCR engine itself.

2) If it's the engine, then, unfortunately, I don't think we can improve this easily. 1.5 seconds is fast. A quick Google search for "ocr speed test" reveals the following page: http://ocrcapture.wordpress.com/2008/06/24/ocr-application-speed-testing-which-is-best/ - most of the commercial grade OCRs take ~1-2 seconds per page...

Revision history for this message
Raghu (rsudupa) said :
#4

Thanks Yury,

with best regards,
Raghu