Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

lib/zlib: add s390 hardware support for kernel zlib_deflate

Patch series "S390 hardware support for kernel zlib", v3.

With IBM z15 mainframe the new DFLTCC instruction is available. It
implements deflate algorithm in hardware (Nest Acceleration Unit - NXU)
with estimated compression and decompression performance orders of
magnitude faster than the current zlib.

This patchset adds s390 hardware compression support to kernel zlib.
The code is based on the userspace zlib implementation:

https://github.com/madler/zlib/pull/410

The coding style is also preserved for future maintainability. There is
only limited set of userspace zlib functions represented in kernel.
Apart from that, all the memory allocation should be performed in
advance. Thus, the workarea structures are extended with the parameter
lists required for the DEFLATE CONVENTION CALL instruction.

Since kernel zlib itself does not support gzip headers, only Adler-32
checksum is processed (also can be produced by DFLTCC facility). Like
it was implemented for userspace, kernel zlib will compress in hardware
on level 1, and in software on all other levels. Decompression will
always happen in hardware (when enabled).

Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired. However it does always produce
the standard conform output which can be inflated anyway.

The new kernel command line parameter 'dfltcc' is introduced to
configure s390 zlib hardware support:

Format: { on | off | def_only | inf_only | always }
on: s390 zlib hardware support for compression on
level 1 and decompression (default)
off: No s390 zlib hardware support
def_only: s390 zlib hardware support for deflate
only (compression on level 1)
inf_only: s390 zlib hardware support for inflate
only (decompression)
always: Same as 'on' but ignores the selected compression
level always using hardware support (used for debugging)

The main purpose of the integration of the NXU support into the kernel
zlib is the use of hardware deflate in btrfs filesystem with on-the-fly
compression enabled. Apart from that, hardware support can also be used
during boot for decompressing the kernel or the ramdisk image

With the patch for btrfs expanding zlib buffer from 1 to 4 pages (patch
6) the following performance results have been achieved using the
ramdisk with btrfs. These are relative numbers based on throughput rate
and compression ratio for zlib level 1:

Input data Deflate rate Inflate rate Compression ratio
NXU/Software NXU/Software NXU/Software
stream of zeroes 1.46 1.02 1.00
random ASCII data 10.44 3.00 0.96
ASCII text (dickens) 6,21 3.33 0.94
binary data (vmlinux) 8,37 3.90 1.02

This means that s390 hardware deflate can provide up to 10 times faster
compression (on level 1) and up to 4 times faster decompression (refers
to all compression levels) for btrfs zlib.

Disclaimer: Performance results are based on IBM internal tests using DD
command-line utility on btrfs on a Fedora 30 based internal driver in
native LPAR on a z15 system. Results may vary based on individual
workload, configuration and software levels.

This patch (of 9):

Create zlib_dfltcc library with the s390 DEFLATE CONVERSION CALL
implementation and related compression functions. Update zlib_deflate
functions with the hooks for s390 hardware support and adjust workspace
structures with extra parameter lists required for hardware deflate.

Link: http://lkml.kernel.org/r/20200103223334.20669-2-zaslonko@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Co-developed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Eduard Shishkin <edward6@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Mikhail Zaslonko and committed by
Linus Torvalds
aa5b395b f88b4265

+751 -102
+7
lib/Kconfig
··· 278 278 tristate 279 279 select BITREVERSE 280 280 281 + config ZLIB_DFLTCC 282 + def_bool y 283 + depends on S390 284 + prompt "Enable s390x DEFLATE CONVERSION CALL support for kernel zlib" 285 + help 286 + Enable s390x hardware support for zlib in the kernel. 287 + 281 288 config LZO_COMPRESS 282 289 tristate 283 290
+1
lib/Makefile
··· 140 140 obj-$(CONFIG_842_DECOMPRESS) += 842/ 141 141 obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/ 142 142 obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/ 143 + obj-$(CONFIG_ZLIB_DFLTCC) += zlib_dfltcc/ 143 144 obj-$(CONFIG_REED_SOLOMON) += reed_solomon/ 144 145 obj-$(CONFIG_BCH) += bch.o 145 146 obj-$(CONFIG_LZO_COMPRESS) += lzo/
+41 -38
lib/zlib_deflate/deflate.c
··· 52 52 #include <linux/zutil.h> 53 53 #include "defutil.h" 54 54 55 + /* architecture-specific bits */ 56 + #ifdef CONFIG_ZLIB_DFLTCC 57 + # include "../zlib_dfltcc/dfltcc.h" 58 + #else 59 + #define DEFLATE_RESET_HOOK(strm) do {} while (0) 60 + #define DEFLATE_HOOK(strm, flush, bstate) 0 61 + #define DEFLATE_NEED_CHECKSUM(strm) 1 62 + #endif 55 63 56 64 /* =========================================================================== 57 65 * Function prototypes. 58 66 */ 59 - typedef enum { 60 - need_more, /* block not completed, need more input or more output */ 61 - block_done, /* block flush performed */ 62 - finish_started, /* finish started, need only more output at next deflate */ 63 - finish_done /* finish done, accept no more input or output */ 64 - } block_state; 65 67 66 68 typedef block_state (*compress_func) (deflate_state *s, int flush); 67 69 /* Compression function. Returns the block state after the call. */ ··· 74 72 static block_state deflate_slow (deflate_state *s, int flush); 75 73 static void lm_init (deflate_state *s); 76 74 static void putShortMSB (deflate_state *s, uInt b); 77 - static void flush_pending (z_streamp strm); 78 75 static int read_buf (z_streamp strm, Byte *buf, unsigned size); 79 76 static uInt longest_match (deflate_state *s, IPos cur_match); 80 77 ··· 98 97 /* Minimum amount of lookahead, except at the end of the input file. 99 98 * See deflate.c for comments about the MIN_MATCH+1. 100 99 */ 100 + 101 + /* Workspace to be allocated for deflate processing */ 102 + typedef struct deflate_workspace { 103 + /* State memory for the deflator */ 104 + deflate_state deflate_memory; 105 + #ifdef CONFIG_ZLIB_DFLTCC 106 + /* State memory for s390 hardware deflate */ 107 + struct dfltcc_state dfltcc_memory; 108 + #endif 109 + Byte *window_memory; 110 + Pos *prev_memory; 111 + Pos *head_memory; 112 + char *overlay_memory; 113 + } deflate_workspace; 114 + 115 + #ifdef CONFIG_ZLIB_DFLTCC 116 + /* dfltcc_state must be doubleword aligned for DFLTCC call */ 117 + static_assert(offsetof(struct deflate_workspace, dfltcc_memory) % 8 == 0); 118 + #endif 101 119 102 120 /* Values for max_lazy_match, good_match and max_chain_length, depending on 103 121 * the desired pack level (0..9). The values given below have been tuned to ··· 227 207 */ 228 208 next = (char *) mem; 229 209 next += sizeof(*mem); 210 + #ifdef CONFIG_ZLIB_DFLTCC 211 + /* 212 + * DFLTCC requires the window to be page aligned. 213 + * Thus, we overallocate and take the aligned portion of the buffer. 214 + */ 215 + mem->window_memory = (Byte *) PTR_ALIGN(next, PAGE_SIZE); 216 + #else 230 217 mem->window_memory = (Byte *) next; 218 + #endif 231 219 next += zlib_deflate_window_memsize(windowBits); 232 220 mem->prev_memory = (Pos *) next; 233 221 next += zlib_deflate_prev_memsize(windowBits); ··· 305 277 zlib_tr_init(s); 306 278 lm_init(s); 307 279 280 + DEFLATE_RESET_HOOK(strm); 281 + 308 282 return Z_OK; 309 283 } 310 284 ··· 323 293 put_byte(s, (Byte)(b >> 8)); 324 294 put_byte(s, (Byte)(b & 0xff)); 325 295 } 326 - 327 - /* ========================================================================= 328 - * Flush as much pending output as possible. All deflate() output goes 329 - * through this function so some applications may wish to modify it 330 - * to avoid allocating a large strm->next_out buffer and copying into it. 331 - * (See also read_buf()). 332 - */ 333 - static void flush_pending( 334 - z_streamp strm 335 - ) 336 - { 337 - deflate_state *s = (deflate_state *) strm->state; 338 - unsigned len = s->pending; 339 - 340 - if (len > strm->avail_out) len = strm->avail_out; 341 - if (len == 0) return; 342 - 343 - if (strm->next_out != NULL) { 344 - memcpy(strm->next_out, s->pending_out, len); 345 - strm->next_out += len; 346 - } 347 - s->pending_out += len; 348 - strm->total_out += len; 349 - strm->avail_out -= len; 350 - s->pending -= len; 351 - if (s->pending == 0) { 352 - s->pending_out = s->pending_buf; 353 - } 354 - } 355 296 356 297 /* ========================================================================= */ 357 298 int zlib_deflate( ··· 405 404 (flush != Z_NO_FLUSH && s->status != FINISH_STATE)) { 406 405 block_state bstate; 407 406 408 - bstate = (*(configuration_table[s->level].func))(s, flush); 407 + bstate = DEFLATE_HOOK(strm, flush, &bstate) ? bstate : 408 + (*(configuration_table[s->level].func))(s, flush); 409 409 410 410 if (bstate == finish_started || bstate == finish_done) { 411 411 s->status = FINISH_STATE; ··· 505 503 506 504 strm->avail_in -= len; 507 505 508 - if (!((deflate_state *)(strm->state))->noheader) { 506 + if (!DEFLATE_NEED_CHECKSUM(strm)) {} 507 + else if (!((deflate_state *)(strm->state))->noheader) { 509 508 strm->adler = zlib_adler32(strm->adler, strm->next_in, len); 510 509 } 511 510 memcpy(buf, strm->next_in, len);
-54
lib/zlib_deflate/deftree.c
··· 76 76 * probability, to avoid transmitting the lengths for unused bit length codes. 77 77 */ 78 78 79 - #define Buf_size (8 * 2*sizeof(char)) 80 - /* Number of bits used within bi_buf. (bi_buf might be implemented on 81 - * more than 16 bits on some systems.) 82 - */ 83 - 84 79 /* =========================================================================== 85 80 * Local data. These are initialized only once. 86 81 */ ··· 142 147 static void compress_block (deflate_state *s, ct_data *ltree, 143 148 ct_data *dtree); 144 149 static void set_data_type (deflate_state *s); 145 - static void bi_windup (deflate_state *s); 146 150 static void bi_flush (deflate_state *s); 147 151 static void copy_block (deflate_state *s, char *buf, unsigned len, 148 152 int header); ··· 162 168 * must not have side effects. dist_code[256] and dist_code[257] are never 163 169 * used. 164 170 */ 165 - 166 - /* =========================================================================== 167 - * Send a value on a given number of bits. 168 - * IN assertion: length <= 16 and value fits in length bits. 169 - */ 170 - #ifdef DEBUG_ZLIB 171 - static void send_bits (deflate_state *s, int value, int length); 172 - 173 - static void send_bits( 174 - deflate_state *s, 175 - int value, /* value to send */ 176 - int length /* number of bits */ 177 - ) 178 - { 179 - Tracevv((stderr," l %2d v %4x ", length, value)); 180 - Assert(length > 0 && length <= 15, "invalid length"); 181 - s->bits_sent += (ulg)length; 182 - 183 - /* If not enough room in bi_buf, use (valid) bits from bi_buf and 184 - * (16 - bi_valid) bits from value, leaving (width - (16-bi_valid)) 185 - * unused bits in value. 186 - */ 187 - if (s->bi_valid > (int)Buf_size - length) { 188 - s->bi_buf |= (value << s->bi_valid); 189 - put_short(s, s->bi_buf); 190 - s->bi_buf = (ush)value >> (Buf_size - s->bi_valid); 191 - s->bi_valid += length - Buf_size; 192 - } else { 193 - s->bi_buf |= value << s->bi_valid; 194 - s->bi_valid += length; 195 - } 196 - } 197 - #else /* !DEBUG_ZLIB */ 198 - 199 - #define send_bits(s, value, length) \ 200 - { int len = length;\ 201 - if (s->bi_valid > (int)Buf_size - len) {\ 202 - int val = value;\ 203 - s->bi_buf |= (val << s->bi_valid);\ 204 - put_short(s, s->bi_buf);\ 205 - s->bi_buf = (ush)val >> (Buf_size - s->bi_valid);\ 206 - s->bi_valid += len - Buf_size;\ 207 - } else {\ 208 - s->bi_buf |= (value) << s->bi_valid;\ 209 - s->bi_valid += len;\ 210 - }\ 211 - } 212 - #endif /* DEBUG_ZLIB */ 213 171 214 172 /* =========================================================================== 215 173 * Initialize the various 'constant' tables. In a multi-threaded environment,
+124 -10
lib/zlib_deflate/defutil.h
··· 1 + #ifndef DEFUTIL_H 2 + #define DEFUTIL_H 1 3 2 - 4 + #include <linux/zutil.h> 3 5 4 6 #define Assert(err, str) 5 7 #define Trace(dummy) ··· 240 238 241 239 } deflate_state; 242 240 243 - typedef struct deflate_workspace { 244 - /* State memory for the deflator */ 245 - deflate_state deflate_memory; 246 - Byte *window_memory; 247 - Pos *prev_memory; 248 - Pos *head_memory; 249 - char *overlay_memory; 250 - } deflate_workspace; 251 - 241 + #ifdef CONFIG_ZLIB_DFLTCC 242 + #define zlib_deflate_window_memsize(windowBits) \ 243 + (2 * (1 << (windowBits)) * sizeof(Byte) + PAGE_SIZE) 244 + #else 252 245 #define zlib_deflate_window_memsize(windowBits) \ 253 246 (2 * (1 << (windowBits)) * sizeof(Byte)) 247 + #endif 254 248 #define zlib_deflate_prev_memsize(windowBits) \ 255 249 ((1 << (windowBits)) * sizeof(Pos)) 256 250 #define zlib_deflate_head_memsize(memLevel) \ ··· 291 293 } 292 294 293 295 /* =========================================================================== 296 + * Reverse the first len bits of a code, using straightforward code (a faster 297 + * method would use a table) 298 + * IN assertion: 1 <= len <= 15 299 + */ 300 + static inline unsigned bi_reverse( 301 + unsigned code, /* the value to invert */ 302 + int len /* its bit length */ 303 + ) 304 + { 305 + register unsigned res = 0; 306 + do { 307 + res |= code & 1; 308 + code >>= 1, res <<= 1; 309 + } while (--len > 0); 310 + return res >> 1; 311 + } 312 + 313 + /* =========================================================================== 294 314 * Flush the bit buffer, keeping at most 7 bits in it. 295 315 */ 296 316 static inline void bi_flush(deflate_state *s) ··· 341 325 #endif 342 326 } 343 327 328 + typedef enum { 329 + need_more, /* block not completed, need more input or more output */ 330 + block_done, /* block flush performed */ 331 + finish_started, /* finish started, need only more output at next deflate */ 332 + finish_done /* finish done, accept no more input or output */ 333 + } block_state; 334 + 335 + #define Buf_size (8 * 2*sizeof(char)) 336 + /* Number of bits used within bi_buf. (bi_buf might be implemented on 337 + * more than 16 bits on some systems.) 338 + */ 339 + 340 + /* =========================================================================== 341 + * Send a value on a given number of bits. 342 + * IN assertion: length <= 16 and value fits in length bits. 343 + */ 344 + #ifdef DEBUG_ZLIB 345 + static void send_bits (deflate_state *s, int value, int length); 346 + 347 + static void send_bits( 348 + deflate_state *s, 349 + int value, /* value to send */ 350 + int length /* number of bits */ 351 + ) 352 + { 353 + Tracevv((stderr," l %2d v %4x ", length, value)); 354 + Assert(length > 0 && length <= 15, "invalid length"); 355 + s->bits_sent += (ulg)length; 356 + 357 + /* If not enough room in bi_buf, use (valid) bits from bi_buf and 358 + * (16 - bi_valid) bits from value, leaving (width - (16-bi_valid)) 359 + * unused bits in value. 360 + */ 361 + if (s->bi_valid > (int)Buf_size - length) { 362 + s->bi_buf |= (value << s->bi_valid); 363 + put_short(s, s->bi_buf); 364 + s->bi_buf = (ush)value >> (Buf_size - s->bi_valid); 365 + s->bi_valid += length - Buf_size; 366 + } else { 367 + s->bi_buf |= value << s->bi_valid; 368 + s->bi_valid += length; 369 + } 370 + } 371 + #else /* !DEBUG_ZLIB */ 372 + 373 + #define send_bits(s, value, length) \ 374 + { int len = length;\ 375 + if (s->bi_valid > (int)Buf_size - len) {\ 376 + int val = value;\ 377 + s->bi_buf |= (val << s->bi_valid);\ 378 + put_short(s, s->bi_buf);\ 379 + s->bi_buf = (ush)val >> (Buf_size - s->bi_valid);\ 380 + s->bi_valid += len - Buf_size;\ 381 + } else {\ 382 + s->bi_buf |= (value) << s->bi_valid;\ 383 + s->bi_valid += len;\ 384 + }\ 385 + } 386 + #endif /* DEBUG_ZLIB */ 387 + 388 + static inline void zlib_tr_send_bits( 389 + deflate_state *s, 390 + int value, 391 + int length 392 + ) 393 + { 394 + send_bits(s, value, length); 395 + } 396 + 397 + /* ========================================================================= 398 + * Flush as much pending output as possible. All deflate() output goes 399 + * through this function so some applications may wish to modify it 400 + * to avoid allocating a large strm->next_out buffer and copying into it. 401 + * (See also read_buf()). 402 + */ 403 + static inline void flush_pending( 404 + z_streamp strm 405 + ) 406 + { 407 + deflate_state *s = (deflate_state *) strm->state; 408 + unsigned len = s->pending; 409 + 410 + if (len > strm->avail_out) len = strm->avail_out; 411 + if (len == 0) return; 412 + 413 + if (strm->next_out != NULL) { 414 + memcpy(strm->next_out, s->pending_out, len); 415 + strm->next_out += len; 416 + } 417 + s->pending_out += len; 418 + strm->total_out += len; 419 + strm->avail_out -= len; 420 + s->pending -= len; 421 + if (s->pending == 0) { 422 + s->pending_out = s->pending_buf; 423 + } 424 + } 425 + #endif /* DEFUTIL_H */
+11
lib/zlib_dfltcc/Makefile
··· 1 + # SPDX-License-Identifier: GPL-2.0-only 2 + # 3 + # This is a modified version of zlib, which does all memory 4 + # allocation ahead of time. 5 + # 6 + # This is the code for s390 zlib hardware support. 7 + # 8 + 9 + obj-$(CONFIG_ZLIB_DFLTCC) += zlib_dfltcc.o 10 + 11 + zlib_dfltcc-objs := dfltcc.o dfltcc_deflate.o dfltcc_syms.o
+52
lib/zlib_dfltcc/dfltcc.c
··· 1 + // SPDX-License-Identifier: Zlib 2 + /* dfltcc.c - SystemZ DEFLATE CONVERSION CALL support. */ 3 + 4 + #include <linux/zutil.h> 5 + #include "dfltcc_util.h" 6 + #include "dfltcc.h" 7 + 8 + char *oesc_msg( 9 + char *buf, 10 + int oesc 11 + ) 12 + { 13 + if (oesc == 0x00) 14 + return NULL; /* Successful completion */ 15 + else { 16 + #ifdef STATIC 17 + return NULL; /* Ignore for pre-boot decompressor */ 18 + #else 19 + sprintf(buf, "Operation-Ending-Supplemental Code is 0x%.2X", oesc); 20 + return buf; 21 + #endif 22 + } 23 + } 24 + 25 + void dfltcc_reset( 26 + z_streamp strm, 27 + uInt size 28 + ) 29 + { 30 + struct dfltcc_state *dfltcc_state = 31 + (struct dfltcc_state *)((char *)strm->state + size); 32 + struct dfltcc_qaf_param *param = 33 + (struct dfltcc_qaf_param *)&dfltcc_state->param; 34 + 35 + /* Initialize available functions */ 36 + if (is_dfltcc_enabled()) { 37 + dfltcc(DFLTCC_QAF, param, NULL, NULL, NULL, NULL, NULL); 38 + memmove(&dfltcc_state->af, param, sizeof(dfltcc_state->af)); 39 + } else 40 + memset(&dfltcc_state->af, 0, sizeof(dfltcc_state->af)); 41 + 42 + /* Initialize parameter block */ 43 + memset(&dfltcc_state->param, 0, sizeof(dfltcc_state->param)); 44 + dfltcc_state->param.nt = 1; 45 + 46 + /* Initialize tuning parameters */ 47 + dfltcc_state->level_mask = DFLTCC_LEVEL_MASK; 48 + dfltcc_state->block_size = DFLTCC_BLOCK_SIZE; 49 + dfltcc_state->block_threshold = DFLTCC_FIRST_FHT_BLOCK_SIZE; 50 + dfltcc_state->dht_threshold = DFLTCC_DHT_MIN_SAMPLE_SIZE; 51 + dfltcc_state->param.ribm = DFLTCC_RIBM; 52 + }
+115
lib/zlib_dfltcc/dfltcc.h
··· 1 + // SPDX-License-Identifier: Zlib 2 + #ifndef DFLTCC_H 3 + #define DFLTCC_H 4 + 5 + #include "../zlib_deflate/defutil.h" 6 + 7 + /* 8 + * Tuning parameters. 9 + */ 10 + #define DFLTCC_LEVEL_MASK 0x2 /* DFLTCC compression for level 1 only */ 11 + #define DFLTCC_BLOCK_SIZE 1048576 12 + #define DFLTCC_FIRST_FHT_BLOCK_SIZE 4096 13 + #define DFLTCC_DHT_MIN_SAMPLE_SIZE 4096 14 + #define DFLTCC_RIBM 0 15 + 16 + /* 17 + * Parameter Block for Query Available Functions. 18 + */ 19 + struct dfltcc_qaf_param { 20 + char fns[16]; 21 + char reserved1[8]; 22 + char fmts[2]; 23 + char reserved2[6]; 24 + }; 25 + 26 + static_assert(sizeof(struct dfltcc_qaf_param) == 32); 27 + 28 + #define DFLTCC_FMT0 0 29 + 30 + /* 31 + * Parameter Block for Generate Dynamic-Huffman Table, Compress and Expand. 32 + */ 33 + struct dfltcc_param_v0 { 34 + uint16_t pbvn; /* Parameter-Block-Version Number */ 35 + uint8_t mvn; /* Model-Version Number */ 36 + uint8_t ribm; /* Reserved for IBM use */ 37 + unsigned reserved32 : 31; 38 + unsigned cf : 1; /* Continuation Flag */ 39 + uint8_t reserved64[8]; 40 + unsigned nt : 1; /* New Task */ 41 + unsigned reserved129 : 1; 42 + unsigned cvt : 1; /* Check Value Type */ 43 + unsigned reserved131 : 1; 44 + unsigned htt : 1; /* Huffman-Table Type */ 45 + unsigned bcf : 1; /* Block-Continuation Flag */ 46 + unsigned bcc : 1; /* Block Closing Control */ 47 + unsigned bhf : 1; /* Block Header Final */ 48 + unsigned reserved136 : 1; 49 + unsigned reserved137 : 1; 50 + unsigned dhtgc : 1; /* DHT Generation Control */ 51 + unsigned reserved139 : 5; 52 + unsigned reserved144 : 5; 53 + unsigned sbb : 3; /* Sub-Byte Boundary */ 54 + uint8_t oesc; /* Operation-Ending-Supplemental Code */ 55 + unsigned reserved160 : 12; 56 + unsigned ifs : 4; /* Incomplete-Function Status */ 57 + uint16_t ifl; /* Incomplete-Function Length */ 58 + uint8_t reserved192[8]; 59 + uint8_t reserved256[8]; 60 + uint8_t reserved320[4]; 61 + uint16_t hl; /* History Length */ 62 + unsigned reserved368 : 1; 63 + uint16_t ho : 15; /* History Offset */ 64 + uint32_t cv; /* Check Value */ 65 + unsigned eobs : 15; /* End-of-block Symbol */ 66 + unsigned reserved431: 1; 67 + uint8_t eobl : 4; /* End-of-block Length */ 68 + unsigned reserved436 : 12; 69 + unsigned reserved448 : 4; 70 + uint16_t cdhtl : 12; /* Compressed-Dynamic-Huffman Table 71 + Length */ 72 + uint8_t reserved464[6]; 73 + uint8_t cdht[288]; 74 + uint8_t reserved[32]; 75 + uint8_t csb[1152]; 76 + }; 77 + 78 + static_assert(sizeof(struct dfltcc_param_v0) == 1536); 79 + 80 + #define CVT_CRC32 0 81 + #define CVT_ADLER32 1 82 + #define HTT_FIXED 0 83 + #define HTT_DYNAMIC 1 84 + 85 + /* 86 + * Extension of inflate_state and deflate_state for DFLTCC. 87 + */ 88 + struct dfltcc_state { 89 + struct dfltcc_param_v0 param; /* Parameter block */ 90 + struct dfltcc_qaf_param af; /* Available functions */ 91 + uLong level_mask; /* Levels on which to use DFLTCC */ 92 + uLong block_size; /* New block each X bytes */ 93 + uLong block_threshold; /* New block after total_in > X */ 94 + uLong dht_threshold; /* New block only if avail_in >= X */ 95 + char msg[64]; /* Buffer for strm->msg */ 96 + }; 97 + 98 + /* Resides right after inflate_state or deflate_state */ 99 + #define GET_DFLTCC_STATE(state) ((struct dfltcc_state *)((state) + 1)) 100 + 101 + /* External functions */ 102 + int dfltcc_can_deflate(z_streamp strm); 103 + int dfltcc_deflate(z_streamp strm, 104 + int flush, 105 + block_state *result); 106 + void dfltcc_reset(z_streamp strm, uInt size); 107 + 108 + #define DEFLATE_RESET_HOOK(strm) \ 109 + dfltcc_reset((strm), sizeof(deflate_state)) 110 + 111 + #define DEFLATE_HOOK dfltcc_deflate 112 + 113 + #define DEFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_deflate((strm))) 114 + 115 + #endif /* DFLTCC_H */
+273
lib/zlib_dfltcc/dfltcc_deflate.c
··· 1 + // SPDX-License-Identifier: Zlib 2 + 3 + #include "../zlib_deflate/defutil.h" 4 + #include "dfltcc_util.h" 5 + #include "dfltcc.h" 6 + #include <linux/zutil.h> 7 + 8 + /* 9 + * Compress. 10 + */ 11 + int dfltcc_can_deflate( 12 + z_streamp strm 13 + ) 14 + { 15 + deflate_state *state = (deflate_state *)strm->state; 16 + struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state); 17 + 18 + /* Unsupported compression settings */ 19 + if (!dfltcc_are_params_ok(state->level, state->w_bits, state->strategy, 20 + dfltcc_state->level_mask)) 21 + return 0; 22 + 23 + /* Unsupported hardware */ 24 + if (!is_bit_set(dfltcc_state->af.fns, DFLTCC_GDHT) || 25 + !is_bit_set(dfltcc_state->af.fns, DFLTCC_CMPR) || 26 + !is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0)) 27 + return 0; 28 + 29 + return 1; 30 + } 31 + 32 + static void dfltcc_gdht( 33 + z_streamp strm 34 + ) 35 + { 36 + deflate_state *state = (deflate_state *)strm->state; 37 + struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param; 38 + size_t avail_in = avail_in = strm->avail_in; 39 + 40 + dfltcc(DFLTCC_GDHT, 41 + param, NULL, NULL, 42 + &strm->next_in, &avail_in, NULL); 43 + } 44 + 45 + static dfltcc_cc dfltcc_cmpr( 46 + z_streamp strm 47 + ) 48 + { 49 + deflate_state *state = (deflate_state *)strm->state; 50 + struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param; 51 + size_t avail_in = strm->avail_in; 52 + size_t avail_out = strm->avail_out; 53 + dfltcc_cc cc; 54 + 55 + cc = dfltcc(DFLTCC_CMPR | HBT_CIRCULAR, 56 + param, &strm->next_out, &avail_out, 57 + &strm->next_in, &avail_in, state->window); 58 + strm->total_in += (strm->avail_in - avail_in); 59 + strm->total_out += (strm->avail_out - avail_out); 60 + strm->avail_in = avail_in; 61 + strm->avail_out = avail_out; 62 + return cc; 63 + } 64 + 65 + static void send_eobs( 66 + z_streamp strm, 67 + const struct dfltcc_param_v0 *param 68 + ) 69 + { 70 + deflate_state *state = (deflate_state *)strm->state; 71 + 72 + zlib_tr_send_bits( 73 + state, 74 + bi_reverse(param->eobs >> (15 - param->eobl), param->eobl), 75 + param->eobl); 76 + flush_pending(strm); 77 + if (state->pending != 0) { 78 + /* The remaining data is located in pending_out[0:pending]. If someone 79 + * calls put_byte() - this might happen in deflate() - the byte will be 80 + * placed into pending_buf[pending], which is incorrect. Move the 81 + * remaining data to the beginning of pending_buf so that put_byte() is 82 + * usable again. 83 + */ 84 + memmove(state->pending_buf, state->pending_out, state->pending); 85 + state->pending_out = state->pending_buf; 86 + } 87 + #ifdef ZLIB_DEBUG 88 + state->compressed_len += param->eobl; 89 + #endif 90 + } 91 + 92 + int dfltcc_deflate( 93 + z_streamp strm, 94 + int flush, 95 + block_state *result 96 + ) 97 + { 98 + deflate_state *state = (deflate_state *)strm->state; 99 + struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state); 100 + struct dfltcc_param_v0 *param = &dfltcc_state->param; 101 + uInt masked_avail_in; 102 + dfltcc_cc cc; 103 + int need_empty_block; 104 + int soft_bcc; 105 + int no_flush; 106 + 107 + if (!dfltcc_can_deflate(strm)) 108 + return 0; 109 + 110 + again: 111 + masked_avail_in = 0; 112 + soft_bcc = 0; 113 + no_flush = flush == Z_NO_FLUSH; 114 + 115 + /* Trailing empty block. Switch to software, except when Continuation Flag 116 + * is set, which means that DFLTCC has buffered some output in the 117 + * parameter block and needs to be called again in order to flush it. 118 + */ 119 + if (flush == Z_FINISH && strm->avail_in == 0 && !param->cf) { 120 + if (param->bcf) { 121 + /* A block is still open, and the hardware does not support closing 122 + * blocks without adding data. Thus, close it manually. 123 + */ 124 + send_eobs(strm, param); 125 + param->bcf = 0; 126 + } 127 + return 0; 128 + } 129 + 130 + if (strm->avail_in == 0 && !param->cf) { 131 + *result = need_more; 132 + return 1; 133 + } 134 + 135 + /* There is an open non-BFINAL block, we are not going to close it just 136 + * yet, we have compressed more than DFLTCC_BLOCK_SIZE bytes and we see 137 + * more than DFLTCC_DHT_MIN_SAMPLE_SIZE bytes. Open a new block with a new 138 + * DHT in order to adapt to a possibly changed input data distribution. 139 + */ 140 + if (param->bcf && no_flush && 141 + strm->total_in > dfltcc_state->block_threshold && 142 + strm->avail_in >= dfltcc_state->dht_threshold) { 143 + if (param->cf) { 144 + /* We need to flush the DFLTCC buffer before writing the 145 + * End-of-block Symbol. Mask the input data and proceed as usual. 146 + */ 147 + masked_avail_in += strm->avail_in; 148 + strm->avail_in = 0; 149 + no_flush = 0; 150 + } else { 151 + /* DFLTCC buffer is empty, so we can manually write the 152 + * End-of-block Symbol right away. 153 + */ 154 + send_eobs(strm, param); 155 + param->bcf = 0; 156 + dfltcc_state->block_threshold = 157 + strm->total_in + dfltcc_state->block_size; 158 + if (strm->avail_out == 0) { 159 + *result = need_more; 160 + return 1; 161 + } 162 + } 163 + } 164 + 165 + /* The caller gave us too much data. Pass only one block worth of 166 + * uncompressed data to DFLTCC and mask the rest, so that on the next 167 + * iteration we start a new block. 168 + */ 169 + if (no_flush && strm->avail_in > dfltcc_state->block_size) { 170 + masked_avail_in += (strm->avail_in - dfltcc_state->block_size); 171 + strm->avail_in = dfltcc_state->block_size; 172 + } 173 + 174 + /* When we have an open non-BFINAL deflate block and caller indicates that 175 + * the stream is ending, we need to close an open deflate block and open a 176 + * BFINAL one. 177 + */ 178 + need_empty_block = flush == Z_FINISH && param->bcf && !param->bhf; 179 + 180 + /* Translate stream to parameter block */ 181 + param->cvt = CVT_ADLER32; 182 + if (!no_flush) 183 + /* We need to close a block. Always do this in software - when there is 184 + * no input data, the hardware will not nohor BCC. */ 185 + soft_bcc = 1; 186 + if (flush == Z_FINISH && !param->bcf) 187 + /* We are about to open a BFINAL block, set Block Header Final bit 188 + * until the stream ends. 189 + */ 190 + param->bhf = 1; 191 + /* DFLTCC-CMPR will write to next_out, so make sure that buffers with 192 + * higher precedence are empty. 193 + */ 194 + Assert(state->pending == 0, "There must be no pending bytes"); 195 + Assert(state->bi_valid < 8, "There must be less than 8 pending bits"); 196 + param->sbb = (unsigned int)state->bi_valid; 197 + if (param->sbb > 0) 198 + *strm->next_out = (Byte)state->bi_buf; 199 + if (param->hl) 200 + param->nt = 0; /* Honor history */ 201 + param->cv = strm->adler; 202 + 203 + /* When opening a block, choose a Huffman-Table Type */ 204 + if (!param->bcf) { 205 + if (strm->total_in == 0 && dfltcc_state->block_threshold > 0) { 206 + param->htt = HTT_FIXED; 207 + } 208 + else { 209 + param->htt = HTT_DYNAMIC; 210 + dfltcc_gdht(strm); 211 + } 212 + } 213 + 214 + /* Deflate */ 215 + do { 216 + cc = dfltcc_cmpr(strm); 217 + if (strm->avail_in < 4096 && masked_avail_in > 0) 218 + /* We are about to call DFLTCC with a small input buffer, which is 219 + * inefficient. Since there is masked data, there will be at least 220 + * one more DFLTCC call, so skip the current one and make the next 221 + * one handle more data. 222 + */ 223 + break; 224 + } while (cc == DFLTCC_CC_AGAIN); 225 + 226 + /* Translate parameter block to stream */ 227 + strm->msg = oesc_msg(dfltcc_state->msg, param->oesc); 228 + state->bi_valid = param->sbb; 229 + if (state->bi_valid == 0) 230 + state->bi_buf = 0; /* Avoid accessing next_out */ 231 + else 232 + state->bi_buf = *strm->next_out & ((1 << state->bi_valid) - 1); 233 + strm->adler = param->cv; 234 + 235 + /* Unmask the input data */ 236 + strm->avail_in += masked_avail_in; 237 + masked_avail_in = 0; 238 + 239 + /* If we encounter an error, it means there is a bug in DFLTCC call */ 240 + Assert(cc != DFLTCC_CC_OP2_CORRUPT || param->oesc == 0, "BUG"); 241 + 242 + /* Update Block-Continuation Flag. It will be used to check whether to call 243 + * GDHT the next time. 244 + */ 245 + if (cc == DFLTCC_CC_OK) { 246 + if (soft_bcc) { 247 + send_eobs(strm, param); 248 + param->bcf = 0; 249 + dfltcc_state->block_threshold = 250 + strm->total_in + dfltcc_state->block_size; 251 + } else 252 + param->bcf = 1; 253 + if (flush == Z_FINISH) { 254 + if (need_empty_block) 255 + /* Make the current deflate() call also close the stream */ 256 + return 0; 257 + else { 258 + bi_windup(state); 259 + *result = finish_done; 260 + } 261 + } else { 262 + if (flush == Z_FULL_FLUSH) 263 + param->hl = 0; /* Clear history */ 264 + *result = flush == Z_NO_FLUSH ? need_more : block_done; 265 + } 266 + } else { 267 + param->bcf = 1; 268 + *result = need_more; 269 + } 270 + if (strm->avail_in != 0 && strm->avail_out != 0) 271 + goto again; /* deflate() must use all input or all output */ 272 + return 1; 273 + }
+17
lib/zlib_dfltcc/dfltcc_syms.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * linux/lib/zlib_dfltcc/dfltcc_syms.c 4 + * 5 + * Exported symbols for the s390 zlib dfltcc support. 6 + * 7 + */ 8 + 9 + #include <linux/init.h> 10 + #include <linux/module.h> 11 + #include <linux/zlib.h> 12 + #include "dfltcc.h" 13 + 14 + EXPORT_SYMBOL(dfltcc_can_deflate); 15 + EXPORT_SYMBOL(dfltcc_deflate); 16 + EXPORT_SYMBOL(dfltcc_reset); 17 + MODULE_LICENSE("GPL");
+110
lib/zlib_dfltcc/dfltcc_util.h
··· 1 + // SPDX-License-Identifier: Zlib 2 + #ifndef DFLTCC_UTIL_H 3 + #define DFLTCC_UTIL_H 4 + 5 + #include <linux/zutil.h> 6 + #include <asm/facility.h> 7 + 8 + /* 9 + * C wrapper for the DEFLATE CONVERSION CALL instruction. 10 + */ 11 + typedef enum { 12 + DFLTCC_CC_OK = 0, 13 + DFLTCC_CC_OP1_TOO_SHORT = 1, 14 + DFLTCC_CC_OP2_TOO_SHORT = 2, 15 + DFLTCC_CC_OP2_CORRUPT = 2, 16 + DFLTCC_CC_AGAIN = 3, 17 + } dfltcc_cc; 18 + 19 + #define DFLTCC_QAF 0 20 + #define DFLTCC_GDHT 1 21 + #define DFLTCC_CMPR 2 22 + #define DFLTCC_XPND 4 23 + #define HBT_CIRCULAR (1 << 7) 24 + #define HB_BITS 15 25 + #define HB_SIZE (1 << HB_BITS) 26 + #define DFLTCC_FACILITY 151 27 + 28 + static inline dfltcc_cc dfltcc( 29 + int fn, 30 + void *param, 31 + Byte **op1, 32 + size_t *len1, 33 + const Byte **op2, 34 + size_t *len2, 35 + void *hist 36 + ) 37 + { 38 + Byte *t2 = op1 ? *op1 : NULL; 39 + size_t t3 = len1 ? *len1 : 0; 40 + const Byte *t4 = op2 ? *op2 : NULL; 41 + size_t t5 = len2 ? *len2 : 0; 42 + register int r0 __asm__("r0") = fn; 43 + register void *r1 __asm__("r1") = param; 44 + register Byte *r2 __asm__("r2") = t2; 45 + register size_t r3 __asm__("r3") = t3; 46 + register const Byte *r4 __asm__("r4") = t4; 47 + register size_t r5 __asm__("r5") = t5; 48 + int cc; 49 + 50 + __asm__ volatile( 51 + ".insn rrf,0xb9390000,%[r2],%[r4],%[hist],0\n" 52 + "ipm %[cc]\n" 53 + : [r2] "+r" (r2) 54 + , [r3] "+r" (r3) 55 + , [r4] "+r" (r4) 56 + , [r5] "+r" (r5) 57 + , [cc] "=r" (cc) 58 + : [r0] "r" (r0) 59 + , [r1] "r" (r1) 60 + , [hist] "r" (hist) 61 + : "cc", "memory"); 62 + t2 = r2; t3 = r3; t4 = r4; t5 = r5; 63 + 64 + if (op1) 65 + *op1 = t2; 66 + if (len1) 67 + *len1 = t3; 68 + if (op2) 69 + *op2 = t4; 70 + if (len2) 71 + *len2 = t5; 72 + return (cc >> 28) & 3; 73 + } 74 + 75 + static inline int is_bit_set( 76 + const char *bits, 77 + int n 78 + ) 79 + { 80 + return bits[n / 8] & (1 << (7 - (n % 8))); 81 + } 82 + 83 + static inline void turn_bit_off( 84 + char *bits, 85 + int n 86 + ) 87 + { 88 + bits[n / 8] &= ~(1 << (7 - (n % 8))); 89 + } 90 + 91 + static inline int dfltcc_are_params_ok( 92 + int level, 93 + uInt window_bits, 94 + int strategy, 95 + uLong level_mask 96 + ) 97 + { 98 + return (level_mask & (1 << level)) != 0 && 99 + (window_bits == HB_BITS) && 100 + (strategy == Z_DEFAULT_STRATEGY); 101 + } 102 + 103 + static inline int is_dfltcc_enabled(void) 104 + { 105 + return test_facility(DFLTCC_FACILITY); 106 + } 107 + 108 + char *oesc_msg(char *buf, int oesc); 109 + 110 + #endif /* DFLTCC_UTIL_H */