Tesseract params and config

Asked by sasha

Hi,
I'm using Sikuli 1.0.1 for a script editor application
I use this for any sort of automation, but as soon as I started bot-ing for games I found myself in need to understand Tesseract deeper.
So now I need to use some custom configurations or parameters, but I'm not sure how to do it even using the low level native java wrappers.
From javadoc Vision class has a setParameter method, but the value is float, while some parameter I need to set are strings (white/black lists)
I could config them via a file in tessdata/config but I don't see any method for the config file param. In tesseract documentation this should be inputed in the command line, but I don't know how to provide this to the java class, nor I understand if VisionProxyJNI could be usefull for this.

I would have liked to see the source of Vision class to reimplement what I need, but I can't find the source code on git...

Question information

Language:
English Edit question
Status:
Solved
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Solved by:
sasha
Solved:
Last query:
Last reply:
Revision history for this message
RaiMan (raimund-hocke) said :
#1

I strongly recommend, to directly switch to Tess4J for the usage of any Tesseract features in your Groovy case and switch off the Tesseract usage in SikuliX.
You already have your own mid-level functions, so why invest in bringing light into the jungle, when there is a possibility to live in the top of the trees ;-)

The Vision implementation as Java/JNI interface to the native modules using the OpenCV and Tesseract C++ API simply is a mess.
With Tesseract it is even worth since the original developer simply left in mid of revising the text find and OCR feature for the usage Tesseract 3. Since then nothing was done in this area.

During the next months the Vision module will vanish anyway, since I am nearly through with porting the image search feature to the OpenCV Java API level (no need for any C++ code any more for that). (new classes Image, ImageFinder, ImageFind).
The same I will do for the Tesseract usage with support of Tess4J, that I already refactored to make it fit into the SikuliX Maven structure.

sources and other valuable links: https://github.com/RaiMan/SikuliX-2014

Revision history for this message
RaiMan (raimund-hocke) said :
#2

BTW: If you decide to switch to Tess4J, it would be nice and helpful, If you share some of your findings and implementation details with me ;-)

Revision history for this message
sasha (vimes-m) said :
#3

I hoped to find another way, but every sign point to Tess4J :)
My only concern is that I already based quite a few scripts on the actual Sikuli implementation and I need to be sure they'll have no problems...
Anyway I'm more then happy to help you because without sikuli I would still be forced to script my bots with AHK... wonderfull tool, but can't even be compared to scripting in groovy...

Just tell me what you need or where to share if find something that could be usefull to Sikuli

Revision history for this message
RaiMan (raimund-hocke) said :
#4

I just added Ruby (with massive help of someone else) as scripting language to the IDE.

Do you think it makes sense, to do this for Groovy too?

If yes, where is the interpreter and how is it used?

Tess4J: Just come back, if you have something working and let me have a look at it (you might contact me privately with my mail at https://launchpad.net/~raimund-hocke

Revision history for this message
RaiMan (raimund-hocke) said :
#5

*** My only concern is that I already based quite a few scripts on the actual Sikuli
Just leave it as it is. I will try to be as API compatible as possible.
But people who have gone down to the APIs of TextRecognizer and Vision might have to adapt things.

In your case I would only implement new scripts and features based on Tess4J

Revision history for this message
sasha (vimes-m) said :
#6

-Groovy: since you can mix pure java and groovy and even use Groovy classes directly from java classes with java syntax, I think integration might be easy. All you need is the groovy-all jar from http://groovy.codehaus.org. Some incredible functions like @Grab (http://groovy.codehaus.org/api/groovy/lang/Grab.html - I think you'll love this...) are only avalable if your RootClassLoader is GroovyClassLoader though, so if you want to support Grab (or even use it for the IDE) you'll probably have to work a bit more on it.
About the engine http://groovy.codehaus.org/api/groovy/util/GroovyScriptEngine.html is fairly easy to use. Here some example, http://groovy.codehaus.org/Embedding+Groovy, but beware the GroovyShell... I really think you'd prefer to go to GroovyScriptEngine. I say so 'cause I started with GroovyShell and then painfully refactored to GroovyScriptEngine. This way you could even run third party scripts with classpath roots as your only concern.

-Tess4J: I don't have anithing working yet, but from what I've seen yesterday it should run smoothly with no conflicts. It will be just a matter of removing my jars when I upgrade to the new version of Sikuli. Unluckily for you I'll probably remove my Vision/OpenCV iplementations, due to the answer you gave me on the other thread, so my interest for Tess4J is less immediate now. I could refactor a more complicate part that I use in a single script, but since I finally made it work as it is with Vision calls, even this is not top priority for me. I'd gladly help though, but I fear you'll need to point to a task to solve/play with... ... probably better if I just write you privately at this point ;)

Revision history for this message
RaiMan (raimund-hocke) said :
#7

great. thanks.
I will have a look at Groovy the next weeks.

Tess4J: not my top priority. I guess I will dig into that not before September. I will contact you, if there is something making sense, that you could help.