drm/xe: improve hibernation on igpu

The GGTT looks to be stored inside stolen memory on igpu which is not
treated as normal RAM. The core kernel skips this memory range when
creating the hibernation image, therefore when coming back from
hibernation the GGTT programming is lost. This seems to cause issues
with broken resume where GuC FW fails to load:

[drm] *ERROR* GT0: load failed: status = 0x400000A0, time = 10ms, freq = 1250MHz (req 1300MHz), done = -1
[drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x50, UKernel = 0x00, MIA = 0x00, Auth = 0x01
[drm] *ERROR* GT0: firmware signature verification failed
[drm] *ERROR* CRITICAL: Xe has declared device 0000:00:02.0 as wedged.

Current GGTT users are kernel internal and tracked as pinned, so it
should be possible to hook into the existing save/restore logic that we
use for dgpu, where the actual evict is skipped but on restore we
importantly restore the GGTT programming. This has been confirmed to
fix hibernation on at least ADL and MTL, though likely all igpu
platforms are affected.

This also means we have a hole in our testing, where the existing s4
tests only really test the driver hooks, and don't go as far as actually
rebooting and restoring from the hibernation image and in turn powering
down RAM (and therefore losing the contents of stolen).

v2 (Brost)
- Remove extra newline and drop unnecessary parentheses.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3275
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241101170156.213490-2-matthew.auld@intel.com
(cherry picked from commit f2a6b8e396666d97ada8e8759dfb6a69d8df6380)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

authored by Matthew Auld and committed by Lucas De Marchi 46f1f4b0 dd886a63

Changed files
+16 -27
drivers
gpu
+16 -21
drivers/gpu/drm/xe/xe_bo.c
··· 948 948 if (WARN_ON(!xe_bo_is_pinned(bo))) 949 949 return -EINVAL; 950 950 951 - if (WARN_ON(xe_bo_is_vram(bo) || !bo->ttm.ttm)) 951 + if (WARN_ON(xe_bo_is_vram(bo))) 952 + return -EINVAL; 953 + 954 + if (WARN_ON(!bo->ttm.ttm && !xe_bo_is_stolen(bo))) 952 955 return -EINVAL; 953 956 954 957 if (!mem_type_is_vram(place->mem_type)) ··· 1726 1723 1727 1724 int xe_bo_pin(struct xe_bo *bo) 1728 1725 { 1726 + struct ttm_place *place = &bo->placements[0]; 1729 1727 struct xe_device *xe = xe_bo_device(bo); 1730 1728 int err; 1731 1729 ··· 1757 1753 */ 1758 1754 if (IS_DGFX(xe) && !(IS_ENABLED(CONFIG_DRM_XE_DEBUG) && 1759 1755 bo->flags & XE_BO_FLAG_INTERNAL_TEST)) { 1760 - struct ttm_place *place = &(bo->placements[0]); 1761 - 1762 1756 if (mem_type_is_vram(place->mem_type)) { 1763 1757 xe_assert(xe, place->flags & TTM_PL_FLAG_CONTIGUOUS); 1764 1758 ··· 1764 1762 vram_region_gpu_offset(bo->ttm.resource)) >> PAGE_SHIFT; 1765 1763 place->lpfn = place->fpfn + (bo->size >> PAGE_SHIFT); 1766 1764 } 1765 + } 1767 1766 1768 - if (mem_type_is_vram(place->mem_type) || 1769 - bo->flags & XE_BO_FLAG_GGTT) { 1770 - spin_lock(&xe->pinned.lock); 1771 - list_add_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present); 1772 - spin_unlock(&xe->pinned.lock); 1773 - } 1767 + if (mem_type_is_vram(place->mem_type) || bo->flags & XE_BO_FLAG_GGTT) { 1768 + spin_lock(&xe->pinned.lock); 1769 + list_add_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present); 1770 + spin_unlock(&xe->pinned.lock); 1774 1771 } 1775 1772 1776 1773 ttm_bo_pin(&bo->ttm); ··· 1817 1816 1818 1817 void xe_bo_unpin(struct xe_bo *bo) 1819 1818 { 1819 + struct ttm_place *place = &bo->placements[0]; 1820 1820 struct xe_device *xe = xe_bo_device(bo); 1821 1821 1822 1822 xe_assert(xe, !bo->ttm.base.import_attach); 1823 1823 xe_assert(xe, xe_bo_is_pinned(bo)); 1824 1824 1825 - if (IS_DGFX(xe) && !(IS_ENABLED(CONFIG_DRM_XE_DEBUG) && 1826 - bo->flags & XE_BO_FLAG_INTERNAL_TEST)) { 1827 - struct ttm_place *place = &(bo->placements[0]); 1828 - 1829 - if (mem_type_is_vram(place->mem_type) || 1830 - bo->flags & XE_BO_FLAG_GGTT) { 1831 - spin_lock(&xe->pinned.lock); 1832 - xe_assert(xe, !list_empty(&bo->pinned_link)); 1833 - list_del_init(&bo->pinned_link); 1834 - spin_unlock(&xe->pinned.lock); 1835 - } 1825 + if (mem_type_is_vram(place->mem_type) || bo->flags & XE_BO_FLAG_GGTT) { 1826 + spin_lock(&xe->pinned.lock); 1827 + xe_assert(xe, !list_empty(&bo->pinned_link)); 1828 + list_del_init(&bo->pinned_link); 1829 + spin_unlock(&xe->pinned.lock); 1836 1830 } 1837 - 1838 1831 ttm_bo_unpin(&bo->ttm); 1839 1832 } 1840 1833
-6
drivers/gpu/drm/xe/xe_bo_evict.c
··· 34 34 u8 id; 35 35 int ret; 36 36 37 - if (!IS_DGFX(xe)) 38 - return 0; 39 - 40 37 /* User memory */ 41 38 for (mem_type = XE_PL_VRAM0; mem_type <= XE_PL_VRAM1; ++mem_type) { 42 39 struct ttm_resource_manager *man = ··· 121 124 { 122 125 struct xe_bo *bo; 123 126 int ret; 124 - 125 - if (!IS_DGFX(xe)) 126 - return 0; 127 127 128 128 spin_lock(&xe->pinned.lock); 129 129 for (;;) {