how to use OCR.readLines?

Asked by matteoa on 2020-06-18

hello,
first of all, my config is:
2.0.4-2020-03-14_08:01/Windows10.0/Java8(64)1.8.0_251-b08

I have to do OCR on a table, it work rather well if I do it line by line.
I thought it could be faster to do it at once with OCR.readLines and psm = 6 so I tried to read the entire table in an image and then pass it to OCR.readLines.

This is my test code that uses manual screen region selection:
img=selectRegion()
imgFile=capture (img)
OCR.options.psm(6)
lstText=OCR.readLines(img)
print "lstLEN=" + str(len(lstText))
print "lst=" + str(lstText)
for i in range(len(lstText)):
    print "mat="+str( lstText[i])
    print "Txt="+OCR.readLine(lstText[i])
exit(0)

From the output I see that there are 16 matches, that is correct, but then the
print "Txt="+OCR.readLine(lstText[i])
line gives an error:
error] script [ GM_DTC ] stopped with error in line 127
[error] java.lang.NullPointerException ( java.lang.NullPointerException )
[error] --- Traceback --- error source first
line: module ( function ) statement
344: Utility ( getLineText ) strOcr = OCR.readLine(rReg, self.options)
127: main ( <module> ) print "Txt="+OCR.readLine(lstText[i])
[error] --- Traceback --- end --------------

I suppose this is a stupid problem from my side, but once obtained the list of matches (as per the docs), how can I extract the strings in it?

Thanks for support

After having set the options (also smallfont)I print them and these are:
OCR.Options:
data = C:\Users\Myself\AppData\Roaming\Sikulix\SikulixTesseract\tessdata
language(eng) oem(3) psm(6) height(10,0) factor(3,00) dpi(96)
variables: user_defined_dpi:300

Question information

Language:
English Edit question
Status:
Solved
For:
Sikuli Edit question
Assignee:
No assignee Edit question
Solved by:
matteoa
Solved:
2020-06-18
Last query:
2020-06-18
Last reply:
2020-06-18
RaiMan (raimund-hocke) said : #1

simply so:

reg = selectRegion()
lines = reg.textLines()
for line in list:
    uprint(line)

uprint() instead of print is needed, because the read text might contain non-ascii characters (unicode/utf-8).

matteoa (matteoa) said : #2

Hi RaiMan,
thanks for prompt response!!
I've tried your snippet, slightly modified (the list does not exists and PSM 6 is absolutely needed):
textOCR.setPSM(6)
reg = selectRegion()
lines = reg.textLines()
for line in lines:
    uprint(line)

Works perfectly!!
Thanks!!!

matteoa (matteoa) said : #3

Thanks RaiMan, this solved my problem!