Validating PDF Layout and content with exact similarity --- use similar(0.99)

Asked by Sudipto Paul

Hi,

We're trying to validate PDF report data - content, as well as layout - using Sikuli.

Initially we thought of taking a screenshot of the whole page of the report.
But then, some of the data in the report is dynamic in nature for the same input data -- like timestamp of report generation keeps on changing, and this happens at 2 or 3 different locations within the Report page.

The Report data is in tabular form.

Some of the cells in the table are very data intense - upto 15 rows and columns of data (this structure is asynchronous in the sense, not all rows will have equal number of columns or cells).
One of the cells can have very large image embedded in it.

So what is the Best way to do this?

We thought of this strategy:

Horizontal parse-
----------------------
If a given row of data contains no dynamic data in any cell, we would try to combine the cells into a single snapshot of Sikuli.
We repeat this for each such row.
For cells that contain dynamic data, we do not take a snapshot, so we break the row into multiple regions and take snapshots only for the regions that have cells with constant data.

Vertical parse-
----------------------
We repeat the same strategy as above, but now take the images column-wise.

When these 2 parses are done, we have validated the data of all cells that contain fixed or static data.
For those cells, we have also validated the layout of the data like relative location of cells, their neighbors, width and height of the cells, gap between neighbouring cell's boundary if that is the case.

Questions:
-------------
1) For the cells that have dynamic data, how do we such a thorough validation?
2) We took snapshot of a denseley populated cell havin text data, and a neighboring cell that has a large image - in Sikuli X-1.0rc3, the Pattern Settings - Matching preview is now working fine, thank you. However, an observation ( and this applies to cells with lesser data ) is that the Similarity rating will show a match till upto 99%, but will fail for 100% - like I recorded it now, and am trying to set the similarity to 100%, without changing any of the desktop environment or PDF report UI settings/zoom levels etc. - so the question is,
a) why are we not able to take this to 100%
b) we choose the 100% similarity of match since only then we know that our expected result is met during the test run - please comment on this.

Regards,
Sudipto.

Question information

Language:
English Edit question
Status:
Solved
For:
SikuliX Edit question
Assignee:
No assignee Edit question
Solved by:
RaiMan
Solved:
Last query:
Last reply:
Revision history for this message
Best RaiMan (raimund-hocke) said :
#1

My experience:

Pattern(img).similar(1.0)

does not work correctly in many cases (don't ask me why, might be a bug).

I found, that

Pattern(img).similar(0.99)

always does the job (want an exact match).

BTW: very ambitious to use Sikuli for your purpose. I love Sikuli, but why don't you parse the PDF source, to check it?

Revision history for this message
Sudipto Paul (asudipto) said :
#2

Hi RaiMan

I wrote code that would use Adobe Acrobat API to read the elements in PDF - this way was able to validate text at each expected location.

But this causes a lot of coding since there are dynamic fields that have to be dealt with separately.

Once our application migrates to a new version, and if there are changes to the report, there is a lot of detail that needs to be revisited.

Its a similar case with the format of the report.

Sikuli script is much faster to redo in these respects.

Regards,
Sudipto.

Revision history for this message
Sudipto Paul (asudipto) said :
#3

Thanks RaiMan, that solved my question.