Infinite loop when detecting spaces
Hi all,
As an attempt to create an alternative for built tesseract OCR, I thought about the following idea (high-level):
1. Create a screenshot for each character (screenshot for 'a', screenshot for 'b', etc...)
2. Iterate over each character in a word and compare to a collection of characters screenshots. The one with perfect match - is the letter.
I know it might be not super efficient and/or quick but as long as it provides consistent results, it's enough for me.
So a first challenge would be "segmentation" (character isolation). To do that I thought to detect the spaces between letters assuming a single 1 pixel wide and couple of pixels high bar of empty space as a separator. So I have created a pattern image which is basically a 1xN bar of white pixels.
As a next step I have created an image pattern of a short string of plain text and ran the following algorithm to validate that the gaps between letter are detected correctly:
text = find("sampleTex
for x in text.findAll(
x.highlight(1)
However it seems that instead of iterating over all the gaps in this text , the algorithm just finds and highlights the same gap each time. I have tried to count the number the loop is running and it's 100! (it should be 25, including the spaces between words).
Any ideas why such behavior might happen?
Cheers,
Eugene S
Question information
- Language:
- English Edit question
- Status:
- Answered
- For:
- SikuliX Edit question
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Can you help with this problem?
Provide an answer of your own, or ask Eugene S for more information if necessary.