tesseract pattern not enforced?

Asked by matteoa on 2020-09-16

Hello,
I'm trying to OCR a text field on the target that contains codes that have a pattern ( implemented as pattern file in tesseract terms):
P\n\n\n\n
C\n\n\n\n
B\n\n\n\n
U\n\n\n\n

In practice there is a letter that can be P or C, or B or U and then 4 more hex digits.
The length is always exactly 5 char in total.

So, at least in my intention with this pattern file, correct output would be, as examples:
P0123, P2EFD, C12EF, B2BCD and so on.
Running the script I see that the vast majority of the output is as expected but I have also some results like PPB, PFF3,CC3 and so on.
Is there a way I can enforce more the adherence to the pattern I setup in Sikulix (Jython) like this:
OCR.globalOptions().variable("user_patterns_file", "C:\\Sikulix\\Util\\Code_OCR.Pattern")
OCR.globalOptions().variable("tessedit_char_whitelist", "PCBU0123456789ABCDEF")
OCR.globalOptions().variable("tessedit_char_blacklist", "abcdefGgHhIiLlMmNnOopQqRrSsTtuVvZzJjYyKkWw-!|")
OCR.globalOptions().variable("load_system_dawg", "F")
OCR.globalOptions().variable("load_freq_dawg", "F")

Thanks in advance.
My configuration is:
2.0.4-2020-03-14_08:01/Windows10.0/Java8(64)1.8.0_251-b08

Question information

Language:
English Edit question
Status:
Answered
For:
Sikuli Edit question
Assignee:
No assignee Edit question
Last query:
2020-09-16
Last reply:
2020-09-17
matteoa (matteoa) said : #3

HI,
I received the email with responses from the Great RaiMan but I'm unable to see them here.
I routed the question to the TEsseract user group, will report if I'll got responses:
https://groups.google.com/g/tesseract-ocr/c/Hmv6YlWYXB8

RaiMan (raimund-hocke) said : #4

LOL, Great RaiMan has hidden his nonsense comments ;-)

I agree, that for this very special case Tesseract user group is the better place.

Thanks for feedback.

Can you help with this problem?

Provide an answer of your own, or ask matteoa for more information if necessary.

To post a message you must log in.