Answer detection works/works not in defferent forms with same settings

Asked by Adam Reiner on 2019-09-04

Dear Adam

so I have imported 2 different forms from 2 different surveys made with LimeSurvey. And now want to optimize those "min_filled" and "max_filled" values in config.php so that the detection of chosen answers would have the best success rate possible. By doing so (testing the detection rate while verifying forms) I figured that the detection seems to work in one form only (with currently roughly 80% success) whilst it does not detect a single ticked box in the other form.
I think this success difference is too big to be random, but rather deriving from the one form having any setting differences.
But *typical user claim that will be proved wrong* all settings are the same!
- both were scanned 300dpi, monochrome
- paper quality is the same
- both had the same LS export settings
- both have page setup enabled
- the orange detection of were answers could have been ticked aligns well in both
- both were tested with the same values in config.php (min/max_filled)
-I even re-scanned the not working form on the same scanner, since the first attempt was on another one
- ..

What did I forget / what do I overlook? Do you have any idea?

Question information

Language:
English Edit question
Status:
Solved
For:
queXF Edit question
Assignee:
No assignee Edit question
Solved by:
Adam Reiner
Solved:
2019-10-24
Last query:
2019-10-24
Last reply:
2019-10-24
Adam Zammit (adamzammit) said : #1

Hi Adam,

Please try scanning in greyscale/grayscale as scanners have different methods of producing monochrome images which can be quite variable.

Adam

Adam Reiner (adamreiner) said : #2

Thanks! I read that monochrome generates the best results and therefore sticked to it. Knowing that - in the rare case of monochrome not working - switching to greyscale can do the trick is quite helpful! (this alreadybumped the detection rate from 0% to 50%)
So now I can tackle the phase of fine-tuning in order to maximise detection rates. If your experience has shown any rules-of-thumb about when to choose greyscale/mono or in which cases what FILLED-settings are recommended? Can I assume that the settings you stated in question #676098 are 'best practice' or have they been just a random shot (hence your "eg" in advance?

I apologize if I were meant to ask this kind of 2nd question in a separate thread (and am wondering whether to click "problem solved" or "I still need an answer")

Adam Zammit (adamzammit) said : #3

No problems - that is good to hear.

The best way is to look in the database after a few forms have been imported.

Have a look at the "formboxes" table - especially the "filled" column - run something like this:

select * from formboxes where fid = 10000 order by filled asc;

(obviously change the fid value to the id of a form you have already imported).

Check the values in the "filled" column and see where they clump up indicating filled vs empty boxes. You can use these values to assist in determining suitable max and min filled boxes.

Adam

Adam Reiner (adamreiner) said : #4

Dear Adam

Thanks again for your help and apologies for my late answer. We achieve very good results with our FILLED settings being

define('MULTIPLE_CHOICE_MIN_FILLED','0.8');
define('MULTIPLE_CHOICE_MAX_FILLED','0.4');
define('SINGLE_CHOICE_MIN_FILLED','0.8');
define('SINGLE_CHOICE_MAX_FILLED','0.4');

so consider this as being solved. Yet again, I stumbled upon the following three issues that we could not solve ourselves:

1: From time to time, importing any filled form results in multiple copies of this single form stacking up in the verify mode (numbers being like 4-8copies each). Whether this occurs or not seems to be random.

2: Some forms seem to be correctly imported, yet never reach the verify mode. One of them doesn't even appear either in "successfully imported files" or in "Failed imported files".

3: Some forms are being aligned perfectly with all ticked boxes being detected. However, for some forms, while most pages are also aligned perfectly with all ticks being detected, those forms do have single pages where no ticks are detected at all since the alignment is far off (judged by orange answer section markers). So for Example, we have a questionnaire with 31 pages. Page 22, 24 and 29 (30, depending on which form) are misaligned with no or very few detected boxes, while all other pages of that exact form are aligned and detected perfectly.
Seems even weirder to us since we use quexml banding files exported by limesurvey rather than banding manually.

Do you have any advice for these issues?

Best
Adam

Adam Zammit (adamzammit) said : #5

Hi again Adam,

Glad to hear the new filled settings helped.

1. The same form should not be able to be imported more than once - the system should avoid this by filename and by file hash. We have seen this occur when the "import" process is running multiple times in the background. Please try and make sure only one import process is running at a time.

2. An imported form should always appear in either the successfully imported or failed imported list - otherwise it should import in the next cycle. Could be a file permission issue. Also check if it appears on the "handle missing pages" section of the system.

3. Each page needs to be aligned using the page alignment marks. It is possible that some interference on the scan is causing this issue. Also check the "page setup" function to see if the area allowed for detection of the corner boxes is large enough. Also I would suggest uploading the form that has mis-aligned pages to the "test form" function, then jump to the page that is mis-aligned and you may see why it is not being detected properly.

Adam

Adam Reiner (adamreiner) said : #6

Dear Adam

We "sourced out" the first two issues to our IT colleague. She will try to take care of them.

@3rd issue: when uploading the form to the "test form" function everthing seems to be fine and align properly.
However, in the verify mode pages 22, 24 (and 29) are misaligned, so orange marks are off the answer sections and ticked boxes aren't detected. I wrote page 29 in brackets since its misaligned on one filled form, but misaligned on the other. Pages 22 and 24 are misaligned on both filled forms. The rest of the 32 pages are aligned and detected correctly, even in verify mode. Areas allowed for detection of corner lines (corner boxes work way worse for us) seems to be large enough and are the same size on every page in "page setup".
If I understand the possible source of problem "interference on the scan" correctly, it would be very likely to switch aligned and mis-aligned pages from form to form, right? Do you have any ideas on how to tackle scan interference issues whatsoever?

Best
Adam

Adam Zammit (adamzammit) said : #7

Hi again Adam,

Regarding the page alignment issues - load the scanned form in to the "page test" function and then navigate to the particular page that is being mis-aligned. You may then find out why queXF is mis-detecting the page edges. Sometimes it is a particular artefact on the form and then using "page setup" to adjust the detection area will fix it.

If you still run into trouble I'm happy to take a look at the form if you want to send it directly to me.

Adam

Adam Reiner (adamreiner) said : #8

Dear Adam

We finally made it. It was actually the layout on some pages that lead to the misalignment (mostly free text boxes, where the blue alignment lines then attached to - instead of attaching to the corner lines). However, the misalignment was not always shown/clear in the test form view. We adjusted the green boxes nevertheless (made them smaller mostly) and after 3-5 approaches every page and even every tick gets aligned and detected properly :) Seems like those forms need some very individual treatments :)

Thanks for your help and patience!

Best
Adam