drm/amdkfd: AIP mGPUs best prefetch location for xnack on

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

For xnack on, if range ACCESS or ACCESS_IN_PLACE (AIP) by single GPU, or
range is ACCESS_IN_PLACE by mGPUs and all mGPUs connection on XGMI same
hive, the best prefetch location is prefetch_loc GPU. Otherwise, the best
prefetch location is always CPU because GPU does not have coherent
mapping VRAM of other GPUs even with large-BAR PCIe connection.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Philip Yang and committed by

Alex Deucher 4 years ago eff8cbf0 f5bd5239

+19 -16

1 changed file

expand all

drivers

gpu

drm

amd

amdkfd

kfd_svm.c

+19 -16

drivers/gpu/drm/amd/amdkfd/kfd_svm.c

··· 2675 2675 return 0; 2676 2676 } 2677 2677 2678 - /* svm_range_best_prefetch_location - decide the best prefetch location 2678 + /** 2679 + * svm_range_best_prefetch_location - decide the best prefetch location 2679 2680 * @prange: svm range structure 2680 2681 * 2681 2682 * For xnack off: 2682 - * If range map to single GPU, the best acutal location is prefetch loc, which 2683 + * If range map to single GPU, the best prefetch location is prefetch_loc, which 2683 2684 * can be CPU or GPU. 2684 2685 * 2685 - * If range map to multiple GPUs, only if mGPU connection on xgmi same hive, 2686 - * the best actual location could be prefetch_loc GPU. If mGPU connection on 2687 - * PCIe, the best actual location is always CPU, because GPU cannot access vram 2688 - * of other GPUs, assuming PCIe small bar (large bar support is not upstream). 2686 + * If range is ACCESS or ACCESS_IN_PLACE by mGPUs, only if mGPU connection on 2687 + * XGMI same hive, the best prefetch location is prefetch_loc GPU, othervise 2688 + * the best prefetch location is always CPU, because GPU can not have coherent 2689 + * mapping VRAM of other GPUs even with large-BAR PCIe connection. 2689 2690 * 2690 2691 * For xnack on: 2691 - * The best actual location is prefetch location. If mGPU connection on xgmi 2692 - * same hive, range map to multiple GPUs. Otherwise, the range only map to 2693 - * actual location GPU. Other GPU access vm fault will trigger migration. 2692 + * If range is not ACCESS_IN_PLACE by mGPUs, the best prefetch location is 2693 + * prefetch_loc, other GPU access will generate vm fault and trigger migration. 2694 + * 2695 + * If range is ACCESS_IN_PLACE by mGPUs, only if mGPU connection on XGMI same 2696 + * hive, the best prefetch location is prefetch_loc GPU, otherwise the best 2697 + * prefetch location is always CPU. 2694 2698 * 2695 2699 * Context: Process context 2696 2700 * ··· 2714 2710 2715 2711 p = container_of(prange->svms, struct kfd_process, svms); 2716 2712 2717 - /* xnack on */ 2718 - if (p->xnack_enabled) 2719 - goto out; 2720 - 2721 - /* xnack off */ 2722 2713 if (!best_loc || best_loc == KFD_IOCTL_SVM_LOCATION_UNDEFINED) 2723 2714 goto out; 2724 2715 ··· 2723 2724 best_loc = 0; 2724 2725 goto out; 2725 2726 } 2726 - bitmap_or(bitmap, prange->bitmap_access, prange->bitmap_aip, 2727 - MAX_GPU_INSTANCE); 2727 + 2728 + if (p->xnack_enabled) 2729 + bitmap_copy(bitmap, prange->bitmap_aip, MAX_GPU_INSTANCE); 2730 + else 2731 + bitmap_or(bitmap, prange->bitmap_access, prange->bitmap_aip, 2732 + MAX_GPU_INSTANCE); 2728 2733 2729 2734 for_each_set_bit(gpuidx, bitmap, MAX_GPU_INSTANCE) { 2730 2735 pdd = kfd_process_device_from_gpuidx(p, gpuidx);