Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

async_tx: add support for asynchronous GF multiplication

[ Based on an original patch by Yuri Tikhonov ]

This adds support for doing asynchronous GF multiplication by adding
two additional functions to the async_tx API:

async_gen_syndrome() does simultaneous XOR and Galois field
multiplication of sources.

async_syndrome_val() validates the given source buffers against known P
and Q values.

When a request is made to run async_pq against more than the hardware
maximum number of supported sources we need to reuse the previous
generated P and Q values as sources into the next operation. Care must
be taken to remove Q from P' and P from Q'. For example to perform a 5
source pq op with hardware that only supports 4 sources at a time the
following approach is taken:

p, q = PQ(src0, src1, src2, src3, COEF({01}, {02}, {04}, {08}))
p', q' = PQ(p, q, q, src4, COEF({00}, {01}, {00}, {10}))

p' = p + q + q + src4 = p + src4
q' = {00}*p + {01}*q + {00}*q + {10}*src4 = q + {10}*src4

Note: 4 is the minimum acceptable maxpq otherwise we punt to
synchronous-software path.

The DMA_PREP_CONTINUE flag indicates to the driver to reuse p and q as
sources (in the above manner) and fill the remaining slots up to maxpq
with the new sources/coefficients.

Note1: Some devices have native support for P+Q continuation and can skip
this extra work. Devices with this capability can advertise it with
dma_set_maxpq. It is up to each driver how to handle the
DMA_PREP_CONTINUE flag.

Note2: The api supports disabling the generation of P when generating Q,
this is ignored by the synchronous path but is implemented by some dma
devices to save unnecessary writes. In this case the continuation
algorithm is simplified to only reuse Q as a source.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Reviewed-by: Andre Noll <maan@systemlinux.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

+493 -9
+3
Documentation/crypto/async-tx-api.txt
··· 64 64 xor_val - xor a series of source buffers and set a flag if the 65 65 result is zero. The implementation attempts to prevent 66 66 writes to memory 67 + pq - generate the p+q (raid6 syndrome) from a series of source buffers 68 + pq_val - validate that a p and or q buffer are in sync with a given series of 69 + sources 67 70 68 71 3.3 Descriptor management: 69 72 The return value is non-NULL and points to a 'descriptor' when the operation
+1 -1
arch/arm/mach-iop13xx/setup.c
··· 506 506 dma_cap_set(DMA_MEMSET, plat_data->cap_mask); 507 507 dma_cap_set(DMA_MEMCPY_CRC32C, plat_data->cap_mask); 508 508 dma_cap_set(DMA_INTERRUPT, plat_data->cap_mask); 509 - dma_cap_set(DMA_PQ_XOR, plat_data->cap_mask); 509 + dma_cap_set(DMA_PQ, plat_data->cap_mask); 510 510 dma_cap_set(DMA_PQ_UPDATE, plat_data->cap_mask); 511 511 dma_cap_set(DMA_PQ_VAL, plat_data->cap_mask); 512 512 break;
+4
crypto/async_tx/Kconfig
··· 14 14 tristate 15 15 select ASYNC_CORE 16 16 17 + config ASYNC_PQ 18 + tristate 19 + select ASYNC_CORE 20 +
+1
crypto/async_tx/Makefile
··· 2 2 obj-$(CONFIG_ASYNC_MEMCPY) += async_memcpy.o 3 3 obj-$(CONFIG_ASYNC_MEMSET) += async_memset.o 4 4 obj-$(CONFIG_ASYNC_XOR) += async_xor.o 5 + obj-$(CONFIG_ASYNC_PQ) += async_pq.o
+388
crypto/async_tx/async_pq.c
··· 1 + /* 2 + * Copyright(c) 2007 Yuri Tikhonov <yur@emcraft.com> 3 + * Copyright(c) 2009 Intel Corporation 4 + * 5 + * This program is free software; you can redistribute it and/or modify it 6 + * under the terms of the GNU General Public License as published by the Free 7 + * Software Foundation; either version 2 of the License, or (at your option) 8 + * any later version. 9 + * 10 + * This program is distributed in the hope that it will be useful, but WITHOUT 11 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 12 + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for 13 + * more details. 14 + * 15 + * You should have received a copy of the GNU General Public License along with 16 + * this program; if not, write to the Free Software Foundation, Inc., 59 17 + * Temple Place - Suite 330, Boston, MA 02111-1307, USA. 18 + * 19 + * The full GNU General Public License is included in this distribution in the 20 + * file called COPYING. 21 + */ 22 + #include <linux/kernel.h> 23 + #include <linux/interrupt.h> 24 + #include <linux/dma-mapping.h> 25 + #include <linux/raid/pq.h> 26 + #include <linux/async_tx.h> 27 + 28 + /** 29 + * scribble - space to hold throwaway P buffer for synchronous gen_syndrome 30 + */ 31 + static struct page *scribble; 32 + 33 + static bool is_raid6_zero_block(struct page *p) 34 + { 35 + return p == (void *) raid6_empty_zero_page; 36 + } 37 + 38 + /* the struct page *blocks[] parameter passed to async_gen_syndrome() 39 + * and async_syndrome_val() contains the 'P' destination address at 40 + * blocks[disks-2] and the 'Q' destination address at blocks[disks-1] 41 + * 42 + * note: these are macros as they are used as lvalues 43 + */ 44 + #define P(b, d) (b[d-2]) 45 + #define Q(b, d) (b[d-1]) 46 + 47 + /** 48 + * do_async_gen_syndrome - asynchronously calculate P and/or Q 49 + */ 50 + static __async_inline struct dma_async_tx_descriptor * 51 + do_async_gen_syndrome(struct dma_chan *chan, struct page **blocks, 52 + const unsigned char *scfs, unsigned int offset, int disks, 53 + size_t len, dma_addr_t *dma_src, 54 + struct async_submit_ctl *submit) 55 + { 56 + struct dma_async_tx_descriptor *tx = NULL; 57 + struct dma_device *dma = chan->device; 58 + enum dma_ctrl_flags dma_flags = 0; 59 + enum async_tx_flags flags_orig = submit->flags; 60 + dma_async_tx_callback cb_fn_orig = submit->cb_fn; 61 + dma_async_tx_callback cb_param_orig = submit->cb_param; 62 + int src_cnt = disks - 2; 63 + unsigned char coefs[src_cnt]; 64 + unsigned short pq_src_cnt; 65 + dma_addr_t dma_dest[2]; 66 + int src_off = 0; 67 + int idx; 68 + int i; 69 + 70 + /* DMAs use destinations as sources, so use BIDIRECTIONAL mapping */ 71 + if (P(blocks, disks)) 72 + dma_dest[0] = dma_map_page(dma->dev, P(blocks, disks), offset, 73 + len, DMA_BIDIRECTIONAL); 74 + else 75 + dma_flags |= DMA_PREP_PQ_DISABLE_P; 76 + if (Q(blocks, disks)) 77 + dma_dest[1] = dma_map_page(dma->dev, Q(blocks, disks), offset, 78 + len, DMA_BIDIRECTIONAL); 79 + else 80 + dma_flags |= DMA_PREP_PQ_DISABLE_Q; 81 + 82 + /* convert source addresses being careful to collapse 'empty' 83 + * sources and update the coefficients accordingly 84 + */ 85 + for (i = 0, idx = 0; i < src_cnt; i++) { 86 + if (is_raid6_zero_block(blocks[i])) 87 + continue; 88 + dma_src[idx] = dma_map_page(dma->dev, blocks[i], offset, len, 89 + DMA_TO_DEVICE); 90 + coefs[idx] = scfs[i]; 91 + idx++; 92 + } 93 + src_cnt = idx; 94 + 95 + while (src_cnt > 0) { 96 + submit->flags = flags_orig; 97 + pq_src_cnt = min(src_cnt, dma_maxpq(dma, dma_flags)); 98 + /* if we are submitting additional pqs, leave the chain open, 99 + * clear the callback parameters, and leave the destination 100 + * buffers mapped 101 + */ 102 + if (src_cnt > pq_src_cnt) { 103 + submit->flags &= ~ASYNC_TX_ACK; 104 + dma_flags |= DMA_COMPL_SKIP_DEST_UNMAP; 105 + submit->cb_fn = NULL; 106 + submit->cb_param = NULL; 107 + } else { 108 + dma_flags &= ~DMA_COMPL_SKIP_DEST_UNMAP; 109 + submit->cb_fn = cb_fn_orig; 110 + submit->cb_param = cb_param_orig; 111 + if (cb_fn_orig) 112 + dma_flags |= DMA_PREP_INTERRUPT; 113 + } 114 + 115 + /* Since we have clobbered the src_list we are committed 116 + * to doing this asynchronously. Drivers force forward 117 + * progress in case they can not provide a descriptor 118 + */ 119 + for (;;) { 120 + tx = dma->device_prep_dma_pq(chan, dma_dest, 121 + &dma_src[src_off], 122 + pq_src_cnt, 123 + &coefs[src_off], len, 124 + dma_flags); 125 + if (likely(tx)) 126 + break; 127 + async_tx_quiesce(&submit->depend_tx); 128 + dma_async_issue_pending(chan); 129 + } 130 + 131 + async_tx_submit(chan, tx, submit); 132 + submit->depend_tx = tx; 133 + 134 + /* drop completed sources */ 135 + src_cnt -= pq_src_cnt; 136 + src_off += pq_src_cnt; 137 + 138 + dma_flags |= DMA_PREP_CONTINUE; 139 + } 140 + 141 + return tx; 142 + } 143 + 144 + /** 145 + * do_sync_gen_syndrome - synchronously calculate a raid6 syndrome 146 + */ 147 + static void 148 + do_sync_gen_syndrome(struct page **blocks, unsigned int offset, int disks, 149 + size_t len, struct async_submit_ctl *submit) 150 + { 151 + void **srcs; 152 + int i; 153 + 154 + if (submit->scribble) 155 + srcs = submit->scribble; 156 + else 157 + srcs = (void **) blocks; 158 + 159 + for (i = 0; i < disks; i++) { 160 + if (is_raid6_zero_block(blocks[i])) { 161 + BUG_ON(i > disks - 3); /* P or Q can't be zero */ 162 + srcs[i] = blocks[i]; 163 + } else 164 + srcs[i] = page_address(blocks[i]) + offset; 165 + } 166 + raid6_call.gen_syndrome(disks, len, srcs); 167 + async_tx_sync_epilog(submit); 168 + } 169 + 170 + /** 171 + * async_gen_syndrome - asynchronously calculate a raid6 syndrome 172 + * @blocks: source blocks from idx 0..disks-3, P @ disks-2 and Q @ disks-1 173 + * @offset: common offset into each block (src and dest) to start transaction 174 + * @disks: number of blocks (including missing P or Q, see below) 175 + * @len: length of operation in bytes 176 + * @submit: submission/completion modifiers 177 + * 178 + * General note: This routine assumes a field of GF(2^8) with a 179 + * primitive polynomial of 0x11d and a generator of {02}. 180 + * 181 + * 'disks' note: callers can optionally omit either P or Q (but not 182 + * both) from the calculation by setting blocks[disks-2] or 183 + * blocks[disks-1] to NULL. When P or Q is omitted 'len' must be <= 184 + * PAGE_SIZE as a temporary buffer of this size is used in the 185 + * synchronous path. 'disks' always accounts for both destination 186 + * buffers. 187 + * 188 + * 'blocks' note: if submit->scribble is NULL then the contents of 189 + * 'blocks' may be overridden 190 + */ 191 + struct dma_async_tx_descriptor * 192 + async_gen_syndrome(struct page **blocks, unsigned int offset, int disks, 193 + size_t len, struct async_submit_ctl *submit) 194 + { 195 + int src_cnt = disks - 2; 196 + struct dma_chan *chan = async_tx_find_channel(submit, DMA_PQ, 197 + &P(blocks, disks), 2, 198 + blocks, src_cnt, len); 199 + struct dma_device *device = chan ? chan->device : NULL; 200 + dma_addr_t *dma_src = NULL; 201 + 202 + BUG_ON(disks > 255 || !(P(blocks, disks) || Q(blocks, disks))); 203 + 204 + if (submit->scribble) 205 + dma_src = submit->scribble; 206 + else if (sizeof(dma_addr_t) <= sizeof(struct page *)) 207 + dma_src = (dma_addr_t *) blocks; 208 + 209 + if (dma_src && device && 210 + (src_cnt <= dma_maxpq(device, 0) || 211 + dma_maxpq(device, DMA_PREP_CONTINUE) > 0)) { 212 + /* run the p+q asynchronously */ 213 + pr_debug("%s: (async) disks: %d len: %zu\n", 214 + __func__, disks, len); 215 + return do_async_gen_syndrome(chan, blocks, raid6_gfexp, offset, 216 + disks, len, dma_src, submit); 217 + } 218 + 219 + /* run the pq synchronously */ 220 + pr_debug("%s: (sync) disks: %d len: %zu\n", __func__, disks, len); 221 + 222 + /* wait for any prerequisite operations */ 223 + async_tx_quiesce(&submit->depend_tx); 224 + 225 + if (!P(blocks, disks)) { 226 + P(blocks, disks) = scribble; 227 + BUG_ON(len + offset > PAGE_SIZE); 228 + } 229 + if (!Q(blocks, disks)) { 230 + Q(blocks, disks) = scribble; 231 + BUG_ON(len + offset > PAGE_SIZE); 232 + } 233 + do_sync_gen_syndrome(blocks, offset, disks, len, submit); 234 + 235 + return NULL; 236 + } 237 + EXPORT_SYMBOL_GPL(async_gen_syndrome); 238 + 239 + /** 240 + * async_syndrome_val - asynchronously validate a raid6 syndrome 241 + * @blocks: source blocks from idx 0..disks-3, P @ disks-2 and Q @ disks-1 242 + * @offset: common offset into each block (src and dest) to start transaction 243 + * @disks: number of blocks (including missing P or Q, see below) 244 + * @len: length of operation in bytes 245 + * @pqres: on val failure SUM_CHECK_P_RESULT and/or SUM_CHECK_Q_RESULT are set 246 + * @spare: temporary result buffer for the synchronous case 247 + * @submit: submission / completion modifiers 248 + * 249 + * The same notes from async_gen_syndrome apply to the 'blocks', 250 + * and 'disks' parameters of this routine. The synchronous path 251 + * requires a temporary result buffer and submit->scribble to be 252 + * specified. 253 + */ 254 + struct dma_async_tx_descriptor * 255 + async_syndrome_val(struct page **blocks, unsigned int offset, int disks, 256 + size_t len, enum sum_check_flags *pqres, struct page *spare, 257 + struct async_submit_ctl *submit) 258 + { 259 + struct dma_chan *chan = async_tx_find_channel(submit, DMA_PQ_VAL, 260 + NULL, 0, blocks, disks, 261 + len); 262 + struct dma_device *device = chan ? chan->device : NULL; 263 + struct dma_async_tx_descriptor *tx; 264 + enum dma_ctrl_flags dma_flags = submit->cb_fn ? DMA_PREP_INTERRUPT : 0; 265 + dma_addr_t *dma_src = NULL; 266 + 267 + BUG_ON(disks < 4); 268 + 269 + if (submit->scribble) 270 + dma_src = submit->scribble; 271 + else if (sizeof(dma_addr_t) <= sizeof(struct page *)) 272 + dma_src = (dma_addr_t *) blocks; 273 + 274 + if (dma_src && device && disks <= dma_maxpq(device, 0)) { 275 + struct device *dev = device->dev; 276 + dma_addr_t *pq = &dma_src[disks-2]; 277 + int i; 278 + 279 + pr_debug("%s: (async) disks: %d len: %zu\n", 280 + __func__, disks, len); 281 + if (!P(blocks, disks)) 282 + dma_flags |= DMA_PREP_PQ_DISABLE_P; 283 + if (!Q(blocks, disks)) 284 + dma_flags |= DMA_PREP_PQ_DISABLE_Q; 285 + for (i = 0; i < disks; i++) 286 + if (likely(blocks[i])) { 287 + BUG_ON(is_raid6_zero_block(blocks[i])); 288 + dma_src[i] = dma_map_page(dev, blocks[i], 289 + offset, len, 290 + DMA_TO_DEVICE); 291 + } 292 + 293 + for (;;) { 294 + tx = device->device_prep_dma_pq_val(chan, pq, dma_src, 295 + disks - 2, 296 + raid6_gfexp, 297 + len, pqres, 298 + dma_flags); 299 + if (likely(tx)) 300 + break; 301 + async_tx_quiesce(&submit->depend_tx); 302 + dma_async_issue_pending(chan); 303 + } 304 + async_tx_submit(chan, tx, submit); 305 + 306 + return tx; 307 + } else { 308 + struct page *p_src = P(blocks, disks); 309 + struct page *q_src = Q(blocks, disks); 310 + enum async_tx_flags flags_orig = submit->flags; 311 + dma_async_tx_callback cb_fn_orig = submit->cb_fn; 312 + void *scribble = submit->scribble; 313 + void *cb_param_orig = submit->cb_param; 314 + void *p, *q, *s; 315 + 316 + pr_debug("%s: (sync) disks: %d len: %zu\n", 317 + __func__, disks, len); 318 + 319 + /* caller must provide a temporary result buffer and 320 + * allow the input parameters to be preserved 321 + */ 322 + BUG_ON(!spare || !scribble); 323 + 324 + /* wait for any prerequisite operations */ 325 + async_tx_quiesce(&submit->depend_tx); 326 + 327 + /* recompute p and/or q into the temporary buffer and then 328 + * check to see the result matches the current value 329 + */ 330 + tx = NULL; 331 + *pqres = 0; 332 + if (p_src) { 333 + init_async_submit(submit, ASYNC_TX_XOR_ZERO_DST, NULL, 334 + NULL, NULL, scribble); 335 + tx = async_xor(spare, blocks, offset, disks-2, len, submit); 336 + async_tx_quiesce(&tx); 337 + p = page_address(p_src) + offset; 338 + s = page_address(spare) + offset; 339 + *pqres |= !!memcmp(p, s, len) << SUM_CHECK_P; 340 + } 341 + 342 + if (q_src) { 343 + P(blocks, disks) = NULL; 344 + Q(blocks, disks) = spare; 345 + init_async_submit(submit, 0, NULL, NULL, NULL, scribble); 346 + tx = async_gen_syndrome(blocks, offset, disks, len, submit); 347 + async_tx_quiesce(&tx); 348 + q = page_address(q_src) + offset; 349 + s = page_address(spare) + offset; 350 + *pqres |= !!memcmp(q, s, len) << SUM_CHECK_Q; 351 + } 352 + 353 + /* restore P, Q and submit */ 354 + P(blocks, disks) = p_src; 355 + Q(blocks, disks) = q_src; 356 + 357 + submit->cb_fn = cb_fn_orig; 358 + submit->cb_param = cb_param_orig; 359 + submit->flags = flags_orig; 360 + async_tx_sync_epilog(submit); 361 + 362 + return NULL; 363 + } 364 + } 365 + EXPORT_SYMBOL_GPL(async_syndrome_val); 366 + 367 + static int __init async_pq_init(void) 368 + { 369 + scribble = alloc_page(GFP_KERNEL); 370 + 371 + if (scribble) 372 + return 0; 373 + 374 + pr_err("%s: failed to allocate required spare page\n", __func__); 375 + 376 + return -ENOMEM; 377 + } 378 + 379 + static void __exit async_pq_exit(void) 380 + { 381 + put_page(scribble); 382 + } 383 + 384 + module_init(async_pq_init); 385 + module_exit(async_pq_exit); 386 + 387 + MODULE_DESCRIPTION("asynchronous raid6 syndrome generation/validation"); 388 + MODULE_LICENSE("GPL");
+1 -1
crypto/async_tx/async_xor.c
··· 62 62 while (src_cnt) { 63 63 submit->flags = flags_orig; 64 64 dma_flags = 0; 65 - xor_src_cnt = min(src_cnt, dma->max_xor); 65 + xor_src_cnt = min(src_cnt, (int)dma->max_xor); 66 66 /* if we are submitting additional xors, leave the chain open, 67 67 * clear the callback parameters, and leave the destination 68 68 * buffer mapped
+4
drivers/dma/dmaengine.c
··· 646 646 !device->device_prep_dma_xor); 647 647 BUG_ON(dma_has_cap(DMA_XOR_VAL, device->cap_mask) && 648 648 !device->device_prep_dma_xor_val); 649 + BUG_ON(dma_has_cap(DMA_PQ, device->cap_mask) && 650 + !device->device_prep_dma_pq); 651 + BUG_ON(dma_has_cap(DMA_PQ_VAL, device->cap_mask) && 652 + !device->device_prep_dma_pq_val); 649 653 BUG_ON(dma_has_cap(DMA_MEMSET, device->cap_mask) && 650 654 !device->device_prep_dma_memset); 651 655 BUG_ON(dma_has_cap(DMA_INTERRUPT, device->cap_mask) &&
+1 -1
drivers/dma/iop-adma.c
··· 1257 1257 1258 1258 dev_printk(KERN_INFO, &pdev->dev, "Intel(R) IOP: " 1259 1259 "( %s%s%s%s%s%s%s%s%s%s)\n", 1260 - dma_has_cap(DMA_PQ_XOR, dma_dev->cap_mask) ? "pq_xor " : "", 1260 + dma_has_cap(DMA_PQ, dma_dev->cap_mask) ? "pq " : "", 1261 1261 dma_has_cap(DMA_PQ_UPDATE, dma_dev->cap_mask) ? "pq_update " : "", 1262 1262 dma_has_cap(DMA_PQ_VAL, dma_dev->cap_mask) ? "pq_val " : "", 1263 1263 dma_has_cap(DMA_XOR, dma_dev->cap_mask) ? "xor " : "",
+9
include/linux/async_tx.h
··· 185 185 186 186 struct dma_async_tx_descriptor *async_trigger_callback(struct async_submit_ctl *submit); 187 187 188 + struct dma_async_tx_descriptor * 189 + async_gen_syndrome(struct page **blocks, unsigned int offset, int src_cnt, 190 + size_t len, struct async_submit_ctl *submit); 191 + 192 + struct dma_async_tx_descriptor * 193 + async_syndrome_val(struct page **blocks, unsigned int offset, int src_cnt, 194 + size_t len, enum sum_check_flags *pqres, struct page *spare, 195 + struct async_submit_ctl *submit); 196 + 188 197 void async_tx_quiesce(struct dma_async_tx_descriptor **tx); 189 198 #endif /* _ASYNC_TX_H_ */
+81 -6
include/linux/dmaengine.h
··· 52 52 enum dma_transaction_type { 53 53 DMA_MEMCPY, 54 54 DMA_XOR, 55 - DMA_PQ_XOR, 55 + DMA_PQ, 56 56 DMA_DUAL_XOR, 57 57 DMA_PQ_UPDATE, 58 58 DMA_XOR_VAL, ··· 70 70 71 71 /** 72 72 * enum dma_ctrl_flags - DMA flags to augment operation preparation, 73 - * control completion, and communicate status. 73 + * control completion, and communicate status. 74 74 * @DMA_PREP_INTERRUPT - trigger an interrupt (callback) upon completion of 75 - * this transaction 75 + * this transaction 76 76 * @DMA_CTRL_ACK - the descriptor cannot be reused until the client 77 - * acknowledges receipt, i.e. has has a chance to establish any 78 - * dependency chains 77 + * acknowledges receipt, i.e. has has a chance to establish any dependency 78 + * chains 79 79 * @DMA_COMPL_SKIP_SRC_UNMAP - set to disable dma-unmapping the source buffer(s) 80 80 * @DMA_COMPL_SKIP_DEST_UNMAP - set to disable dma-unmapping the destination(s) 81 + * @DMA_PREP_PQ_DISABLE_P - prevent generation of P while generating Q 82 + * @DMA_PREP_PQ_DISABLE_Q - prevent generation of Q while generating P 83 + * @DMA_PREP_CONTINUE - indicate to a driver that it is reusing buffers as 84 + * sources that were the result of a previous operation, in the case of a PQ 85 + * operation it continues the calculation with new sources 81 86 */ 82 87 enum dma_ctrl_flags { 83 88 DMA_PREP_INTERRUPT = (1 << 0), 84 89 DMA_CTRL_ACK = (1 << 1), 85 90 DMA_COMPL_SKIP_SRC_UNMAP = (1 << 2), 86 91 DMA_COMPL_SKIP_DEST_UNMAP = (1 << 3), 92 + DMA_PREP_PQ_DISABLE_P = (1 << 4), 93 + DMA_PREP_PQ_DISABLE_Q = (1 << 5), 94 + DMA_PREP_CONTINUE = (1 << 6), 87 95 }; 88 96 89 97 /** ··· 234 226 * @global_node: list_head for global dma_device_list 235 227 * @cap_mask: one or more dma_capability flags 236 228 * @max_xor: maximum number of xor sources, 0 if no capability 229 + * @max_pq: maximum number of PQ sources and PQ-continue capability 237 230 * @dev_id: unique device ID 238 231 * @dev: struct device reference for dma mapping api 239 232 * @device_alloc_chan_resources: allocate resources and return the ··· 243 234 * @device_prep_dma_memcpy: prepares a memcpy operation 244 235 * @device_prep_dma_xor: prepares a xor operation 245 236 * @device_prep_dma_xor_val: prepares a xor validation operation 237 + * @device_prep_dma_pq: prepares a pq operation 238 + * @device_prep_dma_pq_val: prepares a pqzero_sum operation 246 239 * @device_prep_dma_memset: prepares a memset operation 247 240 * @device_prep_dma_interrupt: prepares an end of chain interrupt operation 248 241 * @device_prep_slave_sg: prepares a slave dma operation ··· 259 248 struct list_head channels; 260 249 struct list_head global_node; 261 250 dma_cap_mask_t cap_mask; 262 - int max_xor; 251 + unsigned short max_xor; 252 + unsigned short max_pq; 253 + #define DMA_HAS_PQ_CONTINUE (1 << 15) 263 254 264 255 int dev_id; 265 256 struct device *dev; ··· 278 265 struct dma_async_tx_descriptor *(*device_prep_dma_xor_val)( 279 266 struct dma_chan *chan, dma_addr_t *src, unsigned int src_cnt, 280 267 size_t len, enum sum_check_flags *result, unsigned long flags); 268 + struct dma_async_tx_descriptor *(*device_prep_dma_pq)( 269 + struct dma_chan *chan, dma_addr_t *dst, dma_addr_t *src, 270 + unsigned int src_cnt, const unsigned char *scf, 271 + size_t len, unsigned long flags); 272 + struct dma_async_tx_descriptor *(*device_prep_dma_pq_val)( 273 + struct dma_chan *chan, dma_addr_t *pq, dma_addr_t *src, 274 + unsigned int src_cnt, const unsigned char *scf, size_t len, 275 + enum sum_check_flags *pqres, unsigned long flags); 281 276 struct dma_async_tx_descriptor *(*device_prep_dma_memset)( 282 277 struct dma_chan *chan, dma_addr_t dest, int value, size_t len, 283 278 unsigned long flags); ··· 303 282 dma_cookie_t *used); 304 283 void (*device_issue_pending)(struct dma_chan *chan); 305 284 }; 285 + 286 + static inline void 287 + dma_set_maxpq(struct dma_device *dma, int maxpq, int has_pq_continue) 288 + { 289 + dma->max_pq = maxpq; 290 + if (has_pq_continue) 291 + dma->max_pq |= DMA_HAS_PQ_CONTINUE; 292 + } 293 + 294 + static inline bool dmaf_continue(enum dma_ctrl_flags flags) 295 + { 296 + return (flags & DMA_PREP_CONTINUE) == DMA_PREP_CONTINUE; 297 + } 298 + 299 + static inline bool dmaf_p_disabled_continue(enum dma_ctrl_flags flags) 300 + { 301 + enum dma_ctrl_flags mask = DMA_PREP_CONTINUE | DMA_PREP_PQ_DISABLE_P; 302 + 303 + return (flags & mask) == mask; 304 + } 305 + 306 + static inline bool dma_dev_has_pq_continue(struct dma_device *dma) 307 + { 308 + return (dma->max_pq & DMA_HAS_PQ_CONTINUE) == DMA_HAS_PQ_CONTINUE; 309 + } 310 + 311 + static unsigned short dma_dev_to_maxpq(struct dma_device *dma) 312 + { 313 + return dma->max_pq & ~DMA_HAS_PQ_CONTINUE; 314 + } 315 + 316 + /* dma_maxpq - reduce maxpq in the face of continued operations 317 + * @dma - dma device with PQ capability 318 + * @flags - to check if DMA_PREP_CONTINUE and DMA_PREP_PQ_DISABLE_P are set 319 + * 320 + * When an engine does not support native continuation we need 3 extra 321 + * source slots to reuse P and Q with the following coefficients: 322 + * 1/ {00} * P : remove P from Q', but use it as a source for P' 323 + * 2/ {01} * Q : use Q to continue Q' calculation 324 + * 3/ {00} * Q : subtract Q from P' to cancel (2) 325 + * 326 + * In the case where P is disabled we only need 1 extra source: 327 + * 1/ {01} * Q : use Q to continue Q' calculation 328 + */ 329 + static inline int dma_maxpq(struct dma_device *dma, enum dma_ctrl_flags flags) 330 + { 331 + if (dma_dev_has_pq_continue(dma) || !dmaf_continue(flags)) 332 + return dma_dev_to_maxpq(dma); 333 + else if (dmaf_p_disabled_continue(flags)) 334 + return dma_dev_to_maxpq(dma) - 1; 335 + else if (dmaf_continue(flags)) 336 + return dma_dev_to_maxpq(dma) - 3; 337 + BUG(); 338 + } 306 339 307 340 /* --- public DMA engine API --- */ 308 341