zlib: improve crc32 performance on P8

Bug #1742941 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
Low
Ubuntu on IBM Power Systems Bug Triage
zlib (Ubuntu)
Fix Released
Low
Canonical Foundations Team

Bug Description

Calculate the checksum of data that is 16 byte aligned and a multiple of 16 bytes.

The first step is to reduce it to 1024 bits. We do this in 8 parallel
 chunks in order to mask the latency of the vpmsum instructions. If we
 have more than 32 kB of data to checksum we repeat this step multiple
 times, passing in the previous 1024 bits.

 The next step is to reduce the 1024 bits to 64 bits. This step adds
 32 bits of 0s to the end - this matches what a CRC does. We just
 calculate constants that land the data in this 32 bits.

 We then use fixed point Barrett reduction to compute a mod n over GF(2)
 for n = CRC using POWER8 instructions. We use x = 32.

 http://en.wikipedia.org/wiki/Barrett_reduction

Revision history for this message
bugproxy (bugproxy) wrote : 0001-Change-crc32-to-detect-optimized-code-and-add-power8.patch

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-136495 severity-low targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → zlib (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-01-12 08:22 EDT-------
>'ve created the following which hopefully contains all the features
> required to be upstream friendly.
>
> https://github.com/grooverdan/zlib/commits/power_crc32_c_version
>
> Most importantly compared to the previous PR, it contains the right crc32
> constants.
>
> If you see anything that upstream/distros or anyone else might nit about
> then please tell me (on github or here). The upstream isn't very active so
> presenting something idea on day one is my goal.
>
> Note 1: no clang currently due to...
>
> Note 1a: crc32_vpmsum: __builtin_pack_vector_int128 and
> __builtin_crypto_vpmsumw/__builtin_crypto_vpmsumd - need to review
> https://github.com/racardoso/crc32-vpmsum/commit/
> 97210b9188916eb46377b5eb927ae337948bf016 properly (sorry Rogerio, from
> memory it needs to compile fail for clang BE rather than generate wrong
> results).
>
> Note 1b: clang - no __builtin_cpu_supports (
> https://bugs.llvm.org/show_bug.cgi?id=35898 ) so I might fall back to
> getauxval
>
>
> Is there an better define than __powerpc__ to check in configure to detect
> Power8+ only?
>
> Other suggestions welcome.

Changed in ubuntu-power-systems:
importance: Undecided → Low
assignee: nobody → David Britton (davidpbritton)
tags: added: triage-g
Manoj Iyer (manjo)
Changed in zlib (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → David Britton (davidpbritton)
importance: Undecided → Low
Changed in ubuntu-power-systems:
status: New → Triaged
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-01-17 22:56 EDT-------
code submitted upstream: https://github.com/madler/zlib/pull/335

David Britton (dpb)
Changed in zlib (Ubuntu):
assignee: David Britton (davidpbritton) → Patricia Gaughen (gaughen)
Changed in ubuntu-power-systems:
assignee: David Britton (davidpbritton) → nobody
Changed in zlib (Ubuntu):
assignee: Patricia Gaughen (gaughen) → Canonical Foundations Team (canonical-foundations)
Revision history for this message
Matthias Klose (doko) wrote :

this is not yet committed upstream (2018-01-30).

Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: Triaged → Incomplete
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Revision history for this message
Manoj Iyer (manjo) wrote :

IBM, Can you please update this bug when this patch lands upstream?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-02 08:58 EDT-------
(In reply to comment #45)
> IBM, Can you please update this bug when this patch lands upstream?

Yes, but I do not think we can make 18.04 anymore. Deferring to 18.10.

tags: added: targetmilestone-inin1810
removed: targetmilestone-inin1804
Manoj Iyer (manjo)
summary: - zlib: improve crc32 performance on P8
+ [18.10] zlib: improve crc32 performance on P8
Revision history for this message
Manoj Iyer (manjo) wrote : Re: [18.10] zlib: improve crc32 performance on P8

IBM, 18.10 feature freeze is on the 23rd of Aug, any news on the upstream status of this patch?

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-08-20 14:53 EDT-------
> IBM, 18.10 feature freeze is on the 23rd of Aug, any news on the upstream
> status of this patch?

The patch has been waiting for the zlib maintainer since Feb. 6th.
Likewise for all the pull requests submitted since January: https://github.com/madler/zlib/pulls?q=is%3Aopen+is%3Apr

Canonical, would it be possible to carry this patchset out-of-tree?

Revision history for this message
Manoj Iyer (manjo) wrote : Re: [18.10] zlib: improve crc32 performance on P8

Looks like this patch is still waiting to be merged upstream, looking at this link: https://github.com/madler/zlib/pull/335 Could you please advice if this is being considered to be merged upstream by the maintainer?

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2019-04-08 18:59 EDT-------
Significant background talks are taking place with Mark Adler about the maintenance of zlib for arch specific performance.

Lets see what happens in the next 3 months.

Changed in zlib (Ubuntu):
status: New → Incomplete
Revision history for this message
Andrew Cloke (andrew-cloke) wrote : Re: [18.10] zlib: improve crc32 performance on P8

<email address hidden>: has there been any outcome from the upstream talks with Mark Adler?

summary: - [18.10] zlib: improve crc32 performance on P8
+ zlib: improve crc32 performance on P8
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Also, removing "18.10" from the title, as this release has EOLed.

Changed in ubuntu-power-systems:
assignee: Canonical Foundations Team (canonical-foundations) → nobody
Changed in zlib (Ubuntu):
assignee: Canonical Foundations Team (canonical-foundations) → nobody
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
importance: Low → Medium
Changed in zlib (Ubuntu):
importance: Low → Medium
Changed in ubuntu-power-systems:
importance: Medium → Low
Changed in zlib (Ubuntu):
importance: Medium → Low
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-02-06 22:14 EDT-------
Where we are up to is there is a small amount of progress in the zlib-devel (https://zlib.net/mailman/listinfo/zlib-devel_madler.net) however nothing explicit on or off list about merging arch specific patches (from any architecture vendor).

At the request of the community and other vendors, we have added explicit crc32 tests to the zlib and these have been pushed to that upstream PR.
https://github.com/madler/zlib/pull/335

Since the patch on this bug report, the code has been changed from ASM to C (to increase the portability ), conformed closer to an upstream style as far a location and interfaces, and improved the test suite as mentioned.

The zlib1g-dev package in focal (https://packages.ubuntu.com/focal/zlib1g-dev) still uses configure && make which the upstream PR patches correctly (notably I haven't patched CMakefiles as of today). Building `make crc32_test && ./crc32_test` as a test can be done quickly in the package build process to validate the accuracy.

We hope this prudence and validation in the upstream PR https://github.com/madler/zlib/pull/335 can be accepted for the ubuntu-20.04 LTS release of the zlib package while we continue to work with upstream on maintenance.

Thanks for your consideration.

Frank Heimes (fheimes)
Changed in zlib (Ubuntu):
status: Incomplete → Triaged
Changed in ubuntu-power-systems:
status: Incomplete → Triaged
Changed in zlib (Ubuntu):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Frank Heimes (fheimes)
tags: added: rls-ff-incoming
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
Matthias Klose (doko)
Changed in zlib (Ubuntu):
status: Triaged → In Progress
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Triaged → In Progress
Revision history for this message
Frank Heimes (fheimes) wrote :

The crc32 performance optimization for P8 were added to zlib version 1.2.11.dfsg-2ubuntu1 (thx to doko) and landed already in proposed:
"zlib | 1:1.2.11.dfsg-2ubuntu1 | focal-proposed | source"
Hence changing the status of this ticket to Fix Committed.

Changed in zlib (Ubuntu):
status: In Progress → Fix Committed
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zlib - 1:1.2.11.dfsg-2ubuntu1

---------------
zlib (1:1.2.11.dfsg-2ubuntu1) focal; urgency=medium

  * Merge with Debian; remaining changes:
    - Build x32 packages
    - debian/zlib-core.symbols: Drop dfsg suffix from version
    - Add watch file, with GPG tarball checking, and version mangling
    - Drop unused patches
    - Cherry-pick Permit-a-deflateParams-parameter-change-asap.patch:
      (LP: #1692870)
    - Cherrypick PR#410 to enable hardware-accelerated deflate.
    - Copmile with DFLTCC enabled on s390x. LP: #1823157
    - Improve crc32 performance on P8, proposed upstream patch. LP: #1742941.

zlib (1:1.2.11.dfsg-2) unstable; urgency=low

  * Acknowledge previous NMUs (closes: #949388).
  * Remove zlib1g-dbg in favour of dbgsym (closes: #497831, #926161).
  * Rename stage1 to nobiarch, patch no longer applies due to
    uncoordinated NMUs (closes: #892762).
  * Debhelper has renamed -s to -a.
  * Policy 4.5.0 (no changes).

zlib (1:1.2.11.dfsg-1.2ubuntu1) focal; urgency=medium

  * Merge with Debian; remaining changes:
    - Build x32 packages
    - debian/zlib-core.symbols: Drop dfsg suffix from version
    - Add watch file, with GPG tarball checking, and version mangling
    - Drop unused patches
    - Cherry-pick Permit-a-deflateParams-parameter-change-asap.patch:
      (LP: #1692870)
    - Cherrypick PR#410 to enable hardware-accelerated deflate.
    - Copmile with DFLTCC enabled on s390x. LP: #1823157
  * Improve crc32 performance on P8, proposed upstream patch. LP: #1742941.

zlib (1:1.2.11.dfsg-1.2) unstable; urgency=medium

  * Non-maintainer upload.
  * Fix ftbfs on mips64el caused by the previous NMU by
    remove it from 32-ARCHS to keep NMU minial.

zlib (1:1.2.11.dfsg-1.1) unstable; urgency=medium

  * Non-maintainer upload.
  * Enable lib64 for mipsn32 mipsn32el mipsr6 mipsr6el mipsn32r6 mipsn32r6el x32
    help to add binutils-host64 for 32bit architectures (Closes: 949388)
  * Remove outdated binutils version requirement for mips/mipsel.

 -- Matthias Klose <email address hidden> Tue, 25 Feb 2020 16:59:52 +0100

Changed in zlib (Ubuntu):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.