[1.1.0] Tesseract (Region.text()): how to switch to a different language pack
This definitely only works with version 1.1.0 newer than March 23rd, 2015
--- background knowledge:
The SikuliX local application data folder
SikuliX stores some information, that is needed during runtime in a system dependent special folder in the user home section:
Windows: in a folder Sikulix in the folder pointed to by the environment variable %APPDATA%
Mac: in ~/LibraryApplic
Linux: in ~/.Sikulix
Currently you might find the following subfolders there:
Extensions* - intended to contain extension and plugin artefacts (currently empty)
Lib* - the stuff needed to support scripting with Jython and JRuby
SikulixDownloads - non-SikuliX downloads needed for setup (Jython, JRuby, Tesseract-tessdata, ...)
SikulixStore* - contains other files, that are loaded/produced during runtime (debug, options, last script from net, ...)
SikulixTesseract* - files to support the usage of Tesseract (currently tessdata)
--- get the Tesseract language pack
--1. go to the page:
--2. select the language you want to use and download the pack.
what you download are ....tar.gz files, that have to be unpacked with a tool like 7Zip or UnArchiver.
what you get is a folder structure (-- are folders):
--3. copy the language files to the SikuliX area
copy all files in the downloaded folder tessdata to the folder tessdata in the above mentioned SikuliX local application data folder in the subfolder SikulixTesseract (where you already have eng.traineddata).
--4. switch on the Ocr feature and select the language to use for recognition
where language should be the name from the language.
now using the Region.text() feature should return reasonable results.
- on the download page you select the entry with the title
Chinese (Simplified) language data for Tesseract 3.02
- and get the following file on your machine:
- after unpacking your get
- copy chi_sim.traineddata to the subfolder tessdata in SikulixTesseract in the SikuliX local application data folder
- use in script
- to switch to another language in the same script later just this: