Question: Tiff to hOCR conversion on an average takes about 1 1/2 seconds on Linux platform

Asked by Raghu on 2009-03-03

I need to some clarification regarding time needed for creating an hOCR file from a tiff image.

I am running cuneiform on Linux platform. When I try to create a hOCR file from a tiff image, on an average, the program takes about 1 1/2 seconds.

My first question is that whether it is normal for OCR conversion to take about 1 1/2 seconds?

Is there a way I can improve the performance of cunieform?

We have a need to convert a large amount of tiff documents to hOCR format in real time.


Question information

English Edit question
Cuneiform for Linux Edit question
No assignee Edit question
Solved by:
Yury V. Zaytsev
Last query:
Last reply:
Yury V. Zaytsev (zyv) said : #1

1 1/2 = 1 minute 30 seconds? That's pretty weird but not totally unreasonable.

1) Are you able to provide us with a sample image?
2) Did you try to convert it to BMP to see whether it works quicker or not?

Raghu (rsudupa) said : #2

Thanks Yury for your quick response.

No. I mean 1500 mili seconds (1.5 seconds).

We are receiving g3/g4 format fax files. We convert them to tiff format first. I am using the tiff format.

I can send you the tiff image. But I do not see an option for attaching files in launchpad.


Best Yury V. Zaytsev (zyv) said : #3

1) I know you need TIFF. The reason I am asking to compare TIFF vs BMP is that we need to determine whether this delay is caused by ImageMagick pre-processing (TIFF -> BMP) or it's the OCR engine itself.

2) If it's the engine, then, unfortunately, I don't think we can improve this easily. 1.5 seconds is fast. A quick Google search for "ocr speed test" reveals the following page: - most of the commercial grade OCRs take ~1-2 seconds per page...

Raghu (rsudupa) said : #4

Thanks Yury,

with best regards,