Illegible PDF Text After Import

Asked by Rasinio on 2018-11-27

Hello again,

After installing php-gd, everything seemed to be working. When I import a PDF, all barcodes are recognized as unique and I get the message that the form has been successfully imported. When I try to band it and it shows a preview of the PDF, the PDF text comes up unreadable. The readability is similar to when you open a serialized file with MS Word. The boxes from the form are in the correct spot with the correct number of boxes, but all of the text is completely illegible. I thought it may be something with the PDF or banding file provided to me, so I tried with the test PDF (specifically test_original.pdf) and banding file (test_original.xml) that come with QueXF, but these have the same issue but I don't see any text. The entire form is black with the exception of the boxes. One form does show me legible text after uploading, and that is the test_blank.pdf.

Everything *appears* to be lined up according to the Page test instructions when testing form compatibility. When I go to the Page setup (at the bottom of the admin page), all things there also appear to be correctly lined up on all forms. I don't know if there is a misalignment if the boxes turn a different color or not.

I'm not so sure that the illegible text has much to do with the alignment but more with how it's being rendered, but I'm just guessing.

In the error log I get a lot of notices:

PHP Notice: imagecolorat(): 2105,3316 is out of bounds in /var/www/html/quexf/functions/functions.image.php on line 810
PHP Notice: imagecolorat(): 2105,3316 is out of bounds in /var/www/html/quexf/functions/functions.image.php on line 814
PHP Notice: imagecolorat(): 2106,3316 is out of bounds in /var/www/html/quexf/functions/functions.image.php on line 814
[...and so on to...]
PHP Notice: imagecolorat(): 2222,3316 is out of bounds in /var/www/html/quexf/functions/functions.image.php on line 814

Regarding the above PHP notices, I found https://stackoverflow.com/questions/38340662/php-get-color-at-last-pixels-of-image-error, but I don't know if that solution would apply here. The notices are similar to question #294613.

My set up is the same as when I last posted, with the exception of PHP:
PHP:
PHP 7.2.12-1+ubuntu18.04.1+deb.sury.org+1 (cli) (built: Nov 12 2018 09:55:41)

Ghostscript:
Version 9.22

MySQL:
mysql Ver 14.14 Distrib 5.7.23, for Linux (x86_64) using EditLine wrapper

Apache:
Server version: Apache/2.4.29 (Ubuntu); Server built: 2018-06-07T21:10:10

ADOdb:
libphp-adodb/bionic,bionic,now 5.20.9-1 all [installed] (from 'apt list libphp-adodb')

FireFox:
Mozilla Firefox 61.0.1

To be sure that php-gd is installed:
php-gd/bionic,bionic,now 2:7.2+68+ubuntu18.04.1+deb.sury.org+1 all [installed] (from 'apt list php-gd')

Is there a reason why the test_blank.pdf would render correctly, but the test_original.pdf renders in all black?

Thank you again,
Rasinio

Question information

Language:
English Edit question
Status:
Solved
For:
queXF Edit question
Assignee:
No assignee Edit question
Solved by:
Rasinio
Solved:
2018-12-07
Last query:
2018-12-07
Last reply:
2018-12-05
Adam Zammit (adamzammit) said : #1

Hi Rasinio,

Thank you for your comprehensive report. Although this shouldn't cause an issue for processing scanned forms, you can adjust the config.inc.php setting:

define('IMAGE_THRESHOLD', 221);

This is what queXF uses to decide what is "black" and what is "white". 255 is the minimum and 0 is the maximum. Please adjust and test using the "page test" function and see how you go.

Adam

Rasinio (rgraves8410) said : #2

Hello Adam,

Thank you. I didn't see the define function for an image threshold in the config.inc.php file, so I added:

define('IMAGE_THRESHOLD',150);

...to it. It did lighten the test page to where I can now see text. However, the text itself is still illegible. What is supposed to be read as 'Section A:' in one of the PDFs I've been given, is rendered as something like:

'6HFWQ' and 'RS?[?]%' overlaid on top of one another (i.e. the 'R' is over the '6' and 'H', the 'S' over the 'H' and 'F' etc.)

I don't know how the PDF was created, but there seems to be something wrong with the way the individual characters are being rendered/interpreted from the original PDF (which is in Spanish) to what I'm seeing after importing. I thought it might be because the PDF I have is in Spanish, but this scrambling effect also applies to the test_original.pdf, which is in English. This scrambling effect does *not* apply to the test_blank.pdf though.

I'm still getting the the same PHP Notices in my error logs.

Is there another setting that can be added to the config.inc.php file or added elsewhere that deals with how text is interpreted upon importing? Is there something that I would need to reinstall? Does the way that the PDFs are being read influence whether or not the banding XML will work? When I try to band a form, after a split second I'm left with a blank screen on all three PDFs I've tried (test_original, test_blank, and the one I've been given).

Thank you again,
Rasinio

Adam Zammit (adamzammit) said : #3

Hi again Rasinio,

The text rendering of the PDF shouldn't affect the banding or the processing of forms. It is probably an issue with fonts or locale settings in ghostscript. If you want me to look at it further - please send me an example PDF directly and I will havea a look.

The notices are safe to ignore.

I am not sure what the issue is with the banding - it is possible that it is due to the newer version of PHP being used - I have only tested up to PHP 7.1

Adam

Rasinio (rgraves8410) said : #4

Hey Adam,

The attached is the PDF that was given to me. The attached screenshot is
what I'm seeing after importing.

Rasinio

On Sun, Dec 2, 2018 at 7:22 PM Adam Zammit <
<email address hidden>> wrote:

> Your question #676402 on queXF changed:
> https://answers.launchpad.net/quexf/+question/676402
>
> Status: Open => Needs information
>
> Adam Zammit requested more information:
> Hi again Rasinio,
>
> The text rendering of the PDF shouldn't affect the banding or the
> processing of forms. It is probably an issue with fonts or locale
> settings in ghostscript. If you want me to look at it further - please
> send me an example PDF directly and I will havea a look.
>
> The notices are safe to ignore.
>
> I am not sure what the issue is with the banding - it is possible that
> it is due to the newer version of PHP being used - I have only tested up
> to PHP 7.1
>
> Adam
>
> --
> To answer this request for more information, you can either reply to
> this email or enter your reply at the following page:
> https://answers.launchpad.net/quexf/+question/676402
>
> You received this question notification because you asked the question.
>

Rasinio (rgraves8410) said : #5

Just realized I didn't send the banding file with it.

On Sun, Dec 2, 2018 at 7:22 PM Adam Zammit <
<email address hidden>> wrote:

> Your question #676402 on queXF changed:
> https://answers.launchpad.net/quexf/+question/676402
>
> Status: Open => Needs information
>
> Adam Zammit requested more information:
> Hi again Rasinio,
>
> The text rendering of the PDF shouldn't affect the banding or the
> processing of forms. It is probably an issue with fonts or locale
> settings in ghostscript. If you want me to look at it further - please
> send me an example PDF directly and I will havea a look.
>
> The notices are safe to ignore.
>
> I am not sure what the issue is with the banding - it is possible that
> it is due to the newer version of PHP being used - I have only tested up
> to PHP 7.1
>
> Adam
>
> --
> To answer this request for more information, you can either reply to
> this email or enter your reply at the following page:
> https://answers.launchpad.net/quexf/+question/676402
>
> You received this question notification because you asked the question.
>

Adam Zammit (adamzammit) said : #6

The attachments didn't come through. Can you please send them directly to my email: <email address hidden>

Rasinio (rgraves8410) said : #7

Answers via email not shown here.

Downloading gsfonts fixed the issue with the illegible text.
Downloading php-xml fixed the issue with the banding file not banding when uploaded with the PDF.