What are best practices for downloading documentation media into PPA package builds?

Asked by James Cuzella on 2020-05-28

This question is about PPA packaging, and specifically how to work around network restrictions with launchpad-buildd.

I have been working on packaging for a project called 'projectM' that includes Markdown formatted docs (e.g.: README.md, BUILDING.md, etc...) My first goal to build the C++ project worked. The second goal was to get the Markdown docs installed as HTML format. This only worked locally, not after uploading & building on Launchpad (Soyuz / buildd)

After much searching through the Debian New Maintainers' Guide, and other esoteric online Wikis and resources, I figured out how to get this build working locally. It even will install the HTML docs into 'doc-base', and the README displays with locally embedded images under the package's 'dhelp' section! All the documentation files including images install locally, as they get packaged inside the .deb package during the build.

To get this working, I added 'pandoc' to Build-Depends, and added a target in 'debian/rules' to generate HTML docs from Markdown, using 'pandoc --extract-media=docs_media' like this:


    %.html: %.md
            pandoc --from=markdown+smart --to=html --standalone --metadata=pagetitle="ProjectM $(basename $<)" --extract-media=docs_media --output=$@ $<

    execute_before_dh_install: $(HTML_DOCS)
            @# target cannot be empty, just no-op ':'

To get them installed into the package, I added this to 'debian/projectm.install':

    docs_media/* usr/share/doc/projectm/docs_media/

This target downloads and embeds the referenced URLs for media files (e.g. '.jpg', '.png', '.svg') into a folder 'docs_media'. Then, during install they get copied inside and packaged under '/usr/share/doc/projectm/docs_media/'. Pandoc re-writes the HTML '<img src=' tags to reference the local image files that were downloaded. Works great locally!

This feels like a satisfying and elegant way to include fully offline versions of HTML docs generated from upstream Markdown source files. Next was to create a reproducible build via pbuilder & Launchpad.

Everything works fine on my local system using 'debuild'. However, when using 'pbuilder', it needed a couple extra bits. I did find that it required turning on network access for 'pbuilder' (Adding 'USENETWORK=yes' to '/etc/pbuilderrc' config file). To get this working inside the pbuilder chroot, I also had to add build dependencies on: ca-certificates, openssl, netbase

After trying to add this to a PPA, I found that unfortunately the 'launchpad-buildd' farm does not allow outbound network access!
I was able to verify network adapter was available, but connectivity out seems to be disabled by adding some debug commands after the 'pandoc' build target.

The build log shows some network DNS resolver errors during the 'pandoc --extract-media' steps:


After researching a bit more on this issue, I found some helpful resources explaining that network is disabled for security reasons, and suggesting some alternatives:


This seems to lead me down the path of choosing one of these options:

 1. Package the external resource myself, and either include it in the package itself or as a dependency.
 2. Change the build process to fetch the resource locally
 3. Delay the actual download/build process to be done after the user installs the package

My question is then: What is the best option here considering that this is for embedding images for the HTML docs? Are there some good packaging examples already out there for this use case?

Ancillary thoughts:

 - It seems like #3 would require a hard dependency on pandoc at package install time. So that one I can probably rule out as not ideal.
- Option #2 seems like it assumes they are included in the local source archive already, or are able to be generated locally. I think we can rule that one out given that they are web URL references, and not included in the VCS git repo.
 - Option #1 sounds best, but I'm not sure how that plays well with Debian's pristine source '*.orig.tar.xz' requirement, and quilt patches for .jpg, .png, or other image formats seems ugly.
   - Is there some way of including these files into the source package using a special build target or hook? (e.g.: 'execute_before_dh_<<something.. something.. source package>>', or maybe a magic 'uscan' hook)
   - Including them upstream seems more difficult to achieve given the upstream maintainers would have to agree to adding binary image files to their git repo. Probably not ideal either.
 - The older versions of this package are actually split out in Debian into 8 separate packages!
   - My next steps are to separate the new build into at least 4: projectm-data, libprojectM3, projectm-jack, projectm-pulseaudio
   - I have seen some packages that separate out docs too. Maybe I should create: projectm-docs

Question information

English Edit question
Launchpad itself Edit question
No assignee Edit question
Solved by:
Manfred Hampl
Last query:
Last reply:

This question was reopened

Colin Watson (cjwatson) said : #1

Yes, everything you need for your build has to be either in your source package or in a build-dependency. The usual practice is that if the resource in question is specific to your package, then it should be shipped directly in your source package; if it's part of some other set of tools, then perhaps it should be shipped along with the appropriate tool and you can build-depend on it, although shipping it in your source package would be an acceptable workaround.

You should be able to just add the binary files in question to your source package and list them in debian/source/include-binaries (see "man dpkg-source"). No need to modify the original upstream tarball or mess around with uscan or whatever in this case.

James Cuzella (trinitronx) said : #2

Thanks for the help!

I was able to try out the `debian/source/include-binaries` file and I'm not sure it's going to work for this use case.

This is because I realized a couple things:

 1) `debuild -S` runs the `debian/rules clean` target first
   - The `clean` Makefile target deletes any generated HTML & image files
   - These are generated from Markdown files existing in the upstream git repo / source tarball
   - So, we would need to re-generate these somewhere after `uscan`, after the `clean` target runs, yet before `dpkg-source` runs.
   - Therefore, generation of these HTML docs & image files needs to happen before the call to `dpkg-buildpackage -S` from `debuild -S`.
 2) `debian/source/include-binaries` seems to not support wildcards?
  - Source code that handles this is in Perl, and uses a nested hash with keys directly sourced from lines in that file:
    - https://metacpan.org/source/GUILLEM/Dpkg-1.20.0/lib/Dpkg/Source/BinaryFiles.pm#L55-78
    - The `exists` check would be checking for a full filename in the keys of that hash:

        use Data::Dumper;
        my $foo = {};
        $foo{bar} = {};
        $foo{str} = "wildcard-foo/*"; # set a string as wildcard (pretend this is a line from `debian/source/include-binaries`
        $foo{bar}{baz} = 1; # Let's pretend it found a hardcoded filename: "baz"
        print Dumper($foo{str});
        > $VAR1 = 'wildcard-foo/*';
        $foo{bar}{$foo{str}} = 1; # Here would be dpkg-source finding a line with wildcard in `debian/source/include-binaries`

        print Dumper($foo{bar});
        $VAR1 = {
                  'baz' => 1,
                  'wildcard-foo/*' => 1
        print "wildcards work" if exists $foo{bar}{"wildcard-foo/bar"}; # Here is the same test it makes for a found binary file with path "wildcard-foo/bar"
        # Prints nothing...
        print "wildcards do not work" if ! exists $foo{bar}{"wildcard-foo/bar"}; # The inverted test from above
wildcards do not work

  - This presents a problem for this use case because image filenames are dynamic!
    - The `pandoc --extract-media=` command appears to generate a SHA1 sum filename for each image
    - This also gets embedded into the HTML doc, so hardcoding them into the `debian/source/include-binaries` file would be difficult and require a full rewrite of that file each time the docs are generated from upstream.

So to get this method working, we would have to dynamically generate the include-binaries file list each time. I would also need to run the pandoc commands and regenerate the include-binaries (or pass dynamic flag --include-binaries to dpkg-source) somewhere in between the clean step and the source package build step where dpkg-buildpackage -S => dpkg-source calls are made.

That seemed a bit tricky, and would involve somehow using the `debuild` hook for `dpkg-source-hook`. The manpage for `debuild` seems to describe this one as the one I would need:

> dpkg-source-hook
> Run after cleaning the tree and before running dpkg-source. (Run even if dpkg-source is not being called because -b, -B, or -A is used.)
> Hook is run inside the unpacked source.
> Corresponds to dpkg's source hook.

This means I'd have to remember to call `debuild` with the proper hook flag each time, or set `DEBUILD_DPKG_SOURCE_HOOK` in `/etc/devscripts.conf` or `~/.devscripts`. Both of those files exist outside the `debian/` directory of the package, so this configuration wouldn't propagate with the package. (e.g.: if some other maintainer picks it up later, they would have to have hidden knowledge to configure the hook).

So, unfortunately I ended up going down the other path of messing with a `uscan` custom script to repack the source. I ended up finding a repack script example here:


This seemed pretty handy, and would mean that everything could be automated in the `debian/repack` and `debian/watch` files, which would get shipped along with the source package! Seems like a better solution so as to not forget magic `devscripts.conf` or `debuild` flags later!

It turns out that this old script example is outdated now and was based on `version=3` of the `debian/watch` file & `uscan` CLI API. I was able to code around this and get a naive implementation of a repack script that does what I need. I was also able to figure out how to remove copyrighted ICC color profiles from the images and use a free alternative sRGB ICC profile from the `colord` package. This was after digging through some Debian mailing list threads where they were discussing how to implement DFSG license requirements for ICC profiles that were embedded in some images, pdfs, and other files. They used exiftool to remove these, so I added that to the repack script.

First thread: https://lists.debian.org/debian-devel/2014/05/msg00312.html
Continued: https://lists.debian.org/debian-devel/2014/05/msg00812.html

After working around the new `version=4` `uscan` script call limitations (it now only calls "script --upstream-version version", instead of "script --upstream-version version ../spkg_version.orig.tar.gz"), it works!

Reference: https://www.mankier.com/1/uscan#History_and_Upgrading

I just hardcoded the output tarball with a "+dfsg-1" versioned filename for the `.orig.tar.xzThis worked well the first time, ran all the way through just using `uscan --verbose`, and after running `debuild -S`, I was able to get a package into the PPA!

After pulling the built package down from the PPA, I noticed that exiftool had left over some `*_original` files...
So I fixed the bug in my script and called `exiftool -overwrite_original` and fixed it... then tried to upload the `*.orig.tar.xz` again.

Now Soyuz gave me an error:

    File projectm_3.1.3+dfsg.orig.tar.xz already exists in projectM PPA, but uploaded version has different contents. See more information about this error in https://help.launchpad.net/Packaging/UploadErrors.
    Files specified in DSC are broken or missing, skipping package unpack verification.
    and repack the source tarball, or get them installed into the source package somehow

James Cuzella (trinitronx) said : #3

Oops! My keyboard decided to tab onto the "This Solved My Problem" button too soon and hitting space activated it while I was still typing... lol

So, I suppose I found a way to get the repack option working and automated through `uscan` custom script. However, there seems to be a filename conflict with the way Soyuz is expecting the `.orig.tar.xz` file to be created by this script. I looked at the `/usr/bin/uupdate` script, and it seems to be doing quite a bit of non-trivial version and filename mangling now. There doesn't seem to be any other way to hook into `uupdate` to benefit from that, so it looks like I might need to re-implement some of that stuff in my custom `uscan` script.

Does this sound like a reasonable path to take?

If you or anyone is curious at what I'm working with so far, here are the changes I've added to git branch: `add-debian-packaging`:


James Cuzella (trinitronx) said : #4

Update: I've found my way around the issue with `dput` by pushing the same versioned filename to Soyuz using `gbp dch` with a custom `+dfsg.1` version string:

    gbp dch --debian-branch=add-debian-packaging --new-version=$(lsb_release -cs)~ppa1 --distribution=$(lsb_release -cs ) --dch-opt='-b'

I was able to get the source package to successfully repack, generate the HTML docs via pandoc, and finally remove non-free ICC color profiles from the images (and also those that were included in the upstream repo). The latest working `debian/repack` script is on the branch:


Now I just have some lintian warnings about the version string being malformed somehow:

    E: projectm source: malformed-debian-changelog-version (for non-native)

I was able to figure out that uscan looks for a RegExp pattern called "DEB_EXT", which matches on the "dfsg.N" part of the version string. When I renamed this version to use a dash instead of the period (e.g.: "dfsg-1"), that lintian warning goes away.

I guess I need to figure out how git-buildpackage's version string interacts with the "dfsg.N" pattern... it seems like most "dfsg" Ubuntu packages use a dash instead of a period. It's a bit confusing and seems a bit off, because the `uscan` RegExp pattern looks for a literal dot / period, also anchored to the END of the version string:

    106- ANY_VERSION => '(?:[-_]?(\d[\-+\.:\~\da-zA-Z]*))',
    107- ARCHIVE_EXT => '(?i)(?:\.(?:tar\.xz|tar\.bz2|tar\.gz|zip|tgz|tbz|txz))',
    108: DEB_EXT => '(?:[\+~](debian|dfsg|ds|deb)(\.)?(\d+)?$)',
    110-use constant SIGNATURE_EXT => ARCHIVE_EXT . '(?:\.(?:asc|pgp|gpg|sig|sign))';
    270- s/\@ARCHIVE_EXT\@/ARCHIVE_EXT/ge;
    272: s/\@DEB_EXT\@/DEB_EXT/ge;
    274- my $line = Devscripts::Uscan::WatchLine->new({

    494- # If dversionmangle is "auto", replace it by
    495: # DEB_EXT removal
    496- $_ eq 'auto'
    497- ? ('s/'
    498: . &Devscripts::Uscan::WatchFile::DEB_EXT
    499- . '//')
    500- : ($_)

The "gbp dch" command seems to not allow a suffix. Anything passed to `--new-version` gets appended with the `git` commit SHA.

So, based on this, I changed my "DEB_EXT" pattern to match also a dash "-" as well as a period, and not to anchor the pattern match to end of the string. This still seems a bit unofficial and I'm worried about breakage. The "+dfsg" dash pattern I'm seeing may be a hack that everyone's been doing to get around the lintian warnings, but it's not clear or really discussed much online. I did find these discussions, and DEP-14 which seems reasonably reliable:


Some of the tools seem to apply to upstream version mangling, versus the debian versioning scheme. As far as I can tell, the upstream tarball or git tag version must be needing the "+dfsg.N" as a suffix for `uscan`, which gets passed usually to `uupdate --upstream-version`, or a custom `debian/repack --upstream-version` script call. Yet, both `uupdate` and my `debian/repack` script read the release version from `debian/changelog`, which seems that it needs to follow the pattern enforced by `lintian` with the dashes instead of a period.

Things get a bit stranger and messier when looking at real packages in the wild too...

Just looking at output for `dpkg -l | grep -i dfsg` shows that most packages follow the "dfsg-N" pattern (with or without debian versioning like: "NubuntuN", "NbuildN), and only a few follow the "dfsg.N" pattern, but it's usually suffixed with a dash and the debian / ubuntu version string, with some only having a final "-N":

    $ dpkg -l | grep -i 'dfsg\.'
    ii apg 2.2.3.dfsg.1-5 amd64 Automated Password Generator - Standalone version
    ii info 6.7.0.dfsg.2-5 amd64 Standalone GNU Info documentation browser
    ii install-info 6.7.0.dfsg.2-5 amd64 Manage installed documentation in info format
    ii libdns66 1:9.7.1.dfsg.P2-2ubuntu0.3 amd64 DNS Shared Library used by BIND
    ii libgs8 8.71.dfsg.2-0ubuntu7 amd64 The Ghostscript PostScript/PDF interpreter Library
    ii libisc60 1:9.7.1.dfsg.P2-2ubuntu0.3 amd64 ISC Shared Library used by BIND
    ii libisccfg60 1:9.7.1.dfsg.P2-2ubuntu0.3 amd64 Config File Handling Library used by BIND
    ii libtag1v5:amd64 1.11.1+dfsg.1-0.3ubuntu2 amd64 audio meta-data library
    ii libtag1v5-vanilla:amd64 1.11.1+dfsg.1-0.3ubuntu2 amd64 audio meta-data library - vanilla flavour
    ii libtheora0:amd64 1.1.1+dfsg.1-15ubuntu2 amd64 Theora Video Compression Codec
    ii libtheora0:i386 1.1.1+dfsg.1-15ubuntu2 i386 Theora Video Compression Codec
    ii projectm amd64 PulseAudio module for projectM providing projectm-pulseaudio
    ii texinfo 6.7.0.dfsg.2-5 amd64 Documentation system for on-line information and printed output

Seems like debian versioning is overly complicated, and the tooling that checks these is a bit esoteric. This was my best attempt at trying to figure it out, and I'm still wondering why `lintian` and the other tools like `uscan`, `uupdate` are seeming to want different patterns for "+dfsg" type packages.

Any idea what the official way to version a DFSG package might be for Ubuntu, or how this fits into the `git-buildpackage`, `uscan`, and `lintian` tools workflow? Does this even really matter for a PPA?

James Cuzella (trinitronx) said : #6

Thanks! I must've been too sleep deprived while reading some of those docs earlier this week. Taking the time again to re-read some of those helped. Specifically, the "Version" section in the "Control files and their fields" doc cleared up the confusion for me:


> The version number of a package. The format is: [epoch:]upstream_version[-debian_revision].

The one currently used was:

I noticed it's missing a dash, and instead was using the tilde "~" character. That's why "+dfsg-1" worked to pass lintian, but "+dfsg.1" did not.

I think the main confusing bit was about tilde `~` vs. dash `-`, which actually matters because it's the separator between upstream & debian version parts of the full package version string. I must've copied the version string passed to `gbp dch --new-version=` from some other PPA example or tutorial that really didn't understand the more strict version semantics.

The reason that they must've used tilde `~` could have been due to the sort ordering, which is explained pretty well here:


So, to summarize for future package maintainers:

 - The "--upstream-version" flag passed between `uscan` and `uupdate` is only concerned directly with the "upstream_version" part of the full release version.
   - The full pattern inside `uupdate` or custom `uscan` script (e.g.: `debian/repack`) is then:

    release_version = [epoch:]upstream_version[-debian_revision]

   - While, the other tools all use the full debian changelog / control file version string like "`release_version`" above such as: `gbp dch`, `dch`, `dpkg --compare-versions` and `dpkg-parsechangelog --show-field "Version"`
 - All the rules in those linked Debian & Ubuntu docs about version string characters and sorting still apply.

James Cuzella (trinitronx) said : #7

Thanks Manfred Hampl, that solved my question.