As a member of the Hyperscale SIG, Intel maintains a “hyperscale-intel” repository with optimized versions of packages, to help customers maximize their performance on Intel® architectures. As part of this work, we have released an optimized version of the Zlib package for both CentOS Stream 8 and CentOS Stream 9 yielding significant performance gains.
The optimized Zlib package is identical to the CentOS Stream Zlib package, but with two alternative implementations [1] to provide a fast and high-quality hash function on systems supporting SSE2 and AVX2 instructions. These result in dramatic ~11% and ~18% performance gains, respectively.
Performance Improvement Data
We evaluated the performance of our optimized versions in two distinct ways: 1) zpipe deflate, and 2)gnupg encryption. The results show tremendous improvement in all usages. For all tests, we averaged the results of 10 runs on both CentOS Stream 8 and CentOS Stream 9, and on Intel® platforms [2].
Zpipe Deflate
Since zpipe [3] directly uses Zlib's deflate() function, in this test we recompiled zpipe linking against the enhanced Zlib, then used it to compress a 750MB file.
zpipe < qt-everywhere-opensource-src-5.0.0.tar > /dev/null 2>&1
Compared to the standard version of Zlib, our optimized version yielded 4.3-5.2% and 16.1-16.2% speedups for the SSE and AVX2 implementations.
GnuPG Encrypt
Since GnuPG uses Zlib compression, we measured how long it takes to encrypt 128MB, 256MB, 512MB, and 1GB sized sample files using GnuPG.
dd if=/dev/urandom of=encryptfile bs=1M count=128/256/512/1024 echo 1234567890 | gpg -c --no-options --batch --yes --passphrase-fd 0 -o /dev/null encryptfile 2>&1
NOTE: We did not use the common Open Benchmarking GnuPG test [4] for this test, because it generates its input file from /dev/zero, making it an unrealistic representation of performance for most users. Our test scenario is identical to theirs, just with a realistic input file generated from /dev/random.
Compared to the standard version of Zlib, our optimized version yielded 11.5-13.2% and 16.0-18.1% speedups for the SSE and AVX2 implementations. The performance gains were fairly consistent across file sizes but did improve slightly at larger file sizes.
How To Use
These optimized Zlib packages are available now in the “hyperscale-intel” repository through the Hyperscale SIG. For more information on the Hyperscale SIG and how to use its repositories, please see its documentation [5].
The packages released in the “hyperscale-intel” repo have the more-performant Intel® Advanced Vector Extensions 2 (Intel® AVX2) implementations enabled, as they will yield the best results for most users. If you have a significantly older system without Intel AVX2 support, but still want to take advantage of the Intel® Streaming SIMD Extensions (Intel® SSE) enhancements, you can rebuild our package with the Intel SSE implementation enabled instead. This can be done by deleting “CFLAGS="$CFLAGS -DAVX2_SLIDE"” line in the spec file [6].
References
- https://git.centos.org/rpms/zlib/blob/c8s-sig-hyperscale-intel/f/SOURCES/zlib-1.2.11-x86_64-accelrated-slide-hash.patch
- Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates.
Intel® Xeon® Configuration: 1-node, 2x Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 36 cores, HT On, Turbo On, Total Memory 256GB (4x64GB DDR4 3200 MT/s [3200 MT/s]), BIOS WLYDCRB1.SYS.0027.P82.2204080829, microcode 0xd000375, 2x Ethernet Controller X550, 1x I210 Gigabit Network Connection, 1x 894.3G INTEL SSDSC2KG96, CentOS Stream 8 (kernel 4.18.0-485.el8.x86_64) and CentOS Stream 9 (kernel 5.14.0-295.el9.x86_64), GnuPG 2.2.20 (C8S) and 2.3.3 (C9S) / Zlib 1.2.11, gcc 8.5.0 (C8S) and 11.3.1 (C9S), test by Intel on 02/22/2023. - https://github.com/madler/zlib/blob/v1.2.11/examples/zpipe.c
- https://openbenchmarking.org/test/pts/gnupg
- https://sigs.centos.org/hyperscale/
- https://git.centos.org/rpms/zlib/blob/c8s-sig-hyperscale-intel/f/SPECS/zlib.spec#_159
Notices & Disclaimers
Performance varies by use, configuration and other factors. Learn more on the Performance Index site.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software or service activation.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
Leave a Reply