Verification Boxes Misalignment

Asked by Rasinio on 2018-12-18

Hello... again,

I think I'm at the last step. I've gotten the blank PDF to upload and have banded it successfully. The banding looks perfect. I'm also able to load in the directory of PDFs that are filled out. However, when I go to verify, I'm getting verification boxes occluding the areas I'm supposed to be verifying. Another aspect of this is verification boxes only show up for the first set of boxes on each page. It's not recognizing any other questions on the forms, except for the first on each page. When it does recognize the areas, they are grossly misaligned.

I searched around and found on this forum Question #671304. They didn't have my exact issue, but answers from you and a Mr. Erickson suggested that the incorrect PDF dimensions could cause an issue. I also tried what you suggested, but it didn't work. I also tried the centering button on the verification page, but that made things worse.

I checked the test_filled_1.pdf and it worked, I could verify. It also recognized all boxes on both pages. So I decided to check the dimensions of each PDF (the one that worked and the one that doesn't), I ran the command:

pdfimages -list /path/to/pdf.pdf

...and there were several differences but I'll highlight the ones that seemed to be important.

test_filled PDF
width = 2480
height = 3488
color = gray
comp = 1
bpc = 1
enc = image
interp = yes
x-ppi = 300
y-ppi = 300

the filled PDF I was given
width = 2200
height = 1700
color = rgb
comp = 3
bpc = 8
enc = jpeg
interp = no
x-ppi = 259
y-ppi = 155

All I know is that the raw format of the PDFs needed to be separated (or something to that affect), and Dochub, Microsoft print-to-pdf, and Adobe were used to do that. I didn't do any of the 'prepping', so I don't know what the options were during the save/export if any.

My questions are:
1) Is my issue likely due to resolution problems?
2) Is there a way to make sure that PDFs have the resolution of 300 dpi and in grey scale (like an Adobe-like program)?

Thank you,
Rasinio

Question information

Language:
English Edit question
Status:
Solved
For:
queXF Edit question
Assignee:
No assignee Edit question
Solved by:
Adam Zammit
Solved:
2019-01-14
Last query:
2019-01-14
Last reply:
2019-01-11
Adam Zammit (adamzammit) said : #1

How does the scanned form look if you upload it to the "Test form compatibility with queXF" function?

Launchpad Janitor (janitor) said : #2

This question was expired because it remained in the 'Needs information' state without activity for the last 15 days.

Rasinio (rgraves8410) said : #3

Hey Adam,

I'm reposting this here, I'm not sure if you received it from direct email. There are attachments in the direct email (sent 21 December) that correspond to this.

I tried going to the 'Page setup' option, chose my form, and then clicked the 'Page setup disabled (click to enable)' to enable it, then clicked it again to disable it, looked on all other pages and they all had the link 'Page setup disabled (click to enable)'. I also messed with the threshold, as the PDF was coming out very light at 130-150, and upped it to 205. After that, when I imported the filled out PDF (the one I attached yesterday), via 'Test form compatibility', all pages were aligned.

However, when I try to import via the Directory to verify, I still get the same issue. So to make absolutely sure that some previous settings wasn't affecting the outcome, I went into MySQL and cleared the forms (via delete from <table_name>;):

- questionnaires
- pages
- process
- process_log
- processforms
- forms
- formpages
- verifierquestionnaire

With the new image threshold setting of 205, tested the import of the blank PDF. All pages looked flawless in contrast and alignment. I imported that and banded it. The banding was also perfect. Then I did what you suggested with the 'Page setup disabled (click to enable)' links, then imported from directory. That did something very helpful, it now recognizes an 'X' in a box for all pages, but still only manages to recognize the first set of boxes on each page that contain text. It also only gives me the orange box (gives the user the option to choose the correct response if it's incorrect) overlaying the purple (what the program thinks the participant filled out) on the first set of boxes on each page (please see attached Page2). I do notice that the likely reason why it may not be recognizing boxes for anything else (text related) is because I cannot see the text from any other box groups than the first from each page (please see attached Page4). The next thing I tried was to up the threshold setting to as high as 225 and lower to as low as 120, cleared:

- processforms
- forms
- formpages
- verifierquestionnaire

Restart apache, and re-import via directory and with all settings, still looks like Page4 of the attachment, with the extremes (225 and 120), not showing any text at all regardless of if it was in the first set of boxes or not.

What I'm thinking is supposed to happen during verification is that all multiple/binary choice boxes are supposed to be recognized by the purple boxes. With those multiple/binary choice boxes, are supposed to be overlaid with the orange boxes. All text and numeric fields are supposed to have blank boxes that the user is able to enter in below what the participant wrote. Then the user goes through and does nothing if what the participant wrote looks correct, or types in the boxes what the text is supposed to say, or clicks the correct box on a multiple/binary choice (if there's ambiguity). This is what it seems like based on how close I'm getting to this working as intended.

- Does verification have its own image threshold that I'm missing in the verify.php script?
- Does clicking the 'Page setup disabled (click to enable)' link update something?
- Is my inability to see any text (other than if it's part of the first set of boxes) due to the quality of the PDF itself or something else?

Thank you,
Rasinio

Best Adam Zammit (adamzammit) said : #4

Hi Rasinio:

"What I'm thinking is supposed to happen during verification is that all multiple/binary choice boxes are supposed to be recognized by the purple boxes. With those multiple/binary choice boxes, are supposed to be overlaid with the orange boxes. All text and numeric fields are supposed to have blank boxes that the user is able to enter in below what the participant wrote. Then the user goes through and does nothing if what the participant wrote looks correct, or types in the boxes what the text is supposed to say, or clicks the correct box on a multiple/binary choice (if there's ambiguity). This is what it seems like based on how close I'm getting to this working as intended."

This is correct. Then pressing enter will move to the next box in the page that requires action

- Does verification have its own image threshold that I'm missing in the verify.php script?

no - currently verification and import have the same image threshold - BUT you can change the settings of the items: SINGLE_CHOICE_MAX_FILLED , SINGLE_CHOICE_MIN_FILLED , etc which are used to determine how filled a box should be to be marked as "selected" or not

- Does clicking the 'Page setup disabled (click to enable)' link update something?

Yes this re-calculates the page edges which can sometimes be incorrectly specified when importing a banding XML file. If Page setup is "disabled" then queXF will find the page edges based on proportions of the page, so any size scan should work. If page setup is "enabled" then it will use the specific areas selected in the page setup page to search for corner edges.

- Is my inability to see any text (other than if it's part of the first set of boxes) due to the quality of the PDF itself or something else?

Try pressing "enter" to move to the next box and see if you can then see what has been written on the original form. The white box is the overlay of what is to be manually entered during verification

Adam

Rasinio (rgraves8410) said : #5

Hey Adam,

- "Try pressing "enter" to move to the next box and see if you can then see what has been written on the original form. The white box is the overlay of what is to be manually entered during verification"

I didn't know it was 'enter' to move on to the next set of boxes, I was told it was the space bar. So, not seeing all of the text while verifying until you're on a selected question must be a part of the program. Because I now can see the text as I move through the PDF.

Okay...

"Yes this re-calculates the page edges which can sometimes be incorrectly specified when importing a banding XML file. If Page setup is "disabled" then queXF will find the page edges based on proportions of the page, so any size scan should work. If page setup is "enabled" then it will use the specific areas selected in the page setup page to search for corner edges."

This solves the misalignment (the original issue).

Thank you again,
Rasinio

Rasinio (rgraves8410) said : #6

Thanks Adam Zammit, that solved my question.