Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/radeon: Fixup hw vblank counter/ts for new drm_update_vblank_count() (v2)

commit 4dfd6486 "drm: Use vblank timestamps to guesstimate how many
vblanks were missed" introduced in Linux 4.4-rc1 makes the drm core
more fragile to drivers which don't update hw vblank counters and
vblank timestamps in sync with firing of the vblank irq and
essentially at leading edge of vblank.

This exposed a problem with radeon-kms/amdgpu-kms which do not
satisfy above requirements:

The vblank irq fires a few scanlines before start of vblank, but
programmed pageflips complete at start of vblank and
vblank timestamps update at start of vblank, whereas the
hw vblank counter increments only later, at start of vsync.

This leads to problems like off by one errors for vblank counter
updates, vblank counters apparently going backwards or vblank
timestamps apparently having time going backwards. The net result
is stuttering of graphics in games, or little hangs, as well as
total failure of timing sensitive applications.

See bug #93147 for an example of the regression on Linux 4.4-rc:

https://bugs.freedesktop.org/show_bug.cgi?id=93147

This patch tries to align all above events better from the
viewpoint of the drm core / of external callers to fix the problem:

1. The apparent start of vblank is shifted a few scanlines earlier,
so the vblank irq now always happens after start of this extended
vblank interval and thereby drm_update_vblank_count() always samples
the updated vblank count and timestamp of the new vblank interval.

To achieve this, the reporting of scanout positions by
radeon_get_crtc_scanoutpos() now operates as if the vblank starts
radeon_crtc->lb_vblank_lead_lines before the real start of the hw
vblank interval. This means that the vblank timestamps which are based
on these scanout positions will now update at this earlier start of
vblank.

2. The driver->get_vblank_counter() function will bump the returned
vblank count as read from the hw by +1 if the query happens after
the shifted earlier start of the vblank, but before the real hw increment
at start of vsync, so the counter appears to increment at start of vblank
in sync with the timestamp update.

3. Calls from vblank irq-context and regular non-irq calls are now
treated identical, always simulating the shifted vblank start, to
avoid inconsistent results for queries happening from vblank irq vs.
happening from drm_vblank_enable() or vblank_disable_fn().

4. The radeon_flip_work_func will delay mmio programming a pageflip until
the start of the real vblank iff it happens to execute inside the shifted
earlier start of the vblank, so pageflips now also appear to execute at
start of the shifted vblank, in sync with vblank counter and timestamp
updates. This to avoid some races between updates of vblank count and
timestamps that are used for swap scheduling and pageflip execution which
could cause pageflips to execute before the scheduled target vblank.

The lb_vblank_lead_lines "fudge" value is calculated as the size of
the display controllers line buffer in scanlines for the given video
mode: Vblank irq's are triggered by the line buffer logic when the line
buffer refill for a video frame ends, ie. when the line buffer source read
position enters the hw vblank. This means that a vblank irq could fire at
most as many scanlines before the current reported scanout position of the
crtc timing generator as the number of scanlines the line buffer can
maximally hold for a given video mode.

This patch has been successfully tested on a RV730 card with DCE-3 display
engine and on a evergreen card with DCE-4 display engine, in single-display
and dual-display configuration, with different video modes.

A similar patch is needed for amdgpu-kms to fix the same problem.

Limitations:

- Line buffer sizes in pixels are hard-coded on < DCE-4 to a value
i just guessed to be high enough to work ok, lacking info on the true
sizes atm.

Fixes: fdo#93147

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: Harry Wentland <Harry.Wentland@amd.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>

(v1) Tested-by: Dave Witbrodt <dawitbro@sbcglobal.net>

(v2) Refine radeon_flip_work_func() for better efficiency:

In radeon_flip_work_func, replace the busy waiting udelay(5)
with event lock held by a more performance and energy efficient
usleep_range() until at least predicted true start of hw vblank,
with some slack for scheduler happiness. Release the event lock
during waits to not delay other outputs in doing their stuff, as
the waiting can last up to 200 usecs in some cases.

Retested on DCE-3 and DCE-4 to verify it still works nicely.

(v2) Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Mario Kleiner and committed by
Alex Deucher
c55d21ea ac4a9350

+164 -29
+3
drivers/gpu/drm/radeon/cik.c
··· 9630 9630 (rdev->disp_priority == 2)) { 9631 9631 DRM_DEBUG_KMS("force priority to high\n"); 9632 9632 } 9633 + 9634 + /* Save number of lines the linebuffer leads before the scanout */ 9635 + radeon_crtc->lb_vblank_lead_lines = DIV_ROUND_UP(lb_size, mode->crtc_hdisplay); 9633 9636 } 9634 9637 9635 9638 /* select wm A */
+3
drivers/gpu/drm/radeon/evergreen.c
··· 2372 2372 c.full = dfixed_div(c, a); 2373 2373 priority_b_mark = dfixed_trunc(c); 2374 2374 priority_b_cnt |= priority_b_mark & PRIORITY_MARK_MASK; 2375 + 2376 + /* Save number of lines the linebuffer leads before the scanout */ 2377 + radeon_crtc->lb_vblank_lead_lines = DIV_ROUND_UP(lb_size, mode->crtc_hdisplay); 2375 2378 } 2376 2379 2377 2380 /* select wm A */
+10
drivers/gpu/drm/radeon/r100.c
··· 3217 3217 uint32_t pixel_bytes1 = 0; 3218 3218 uint32_t pixel_bytes2 = 0; 3219 3219 3220 + /* Guess line buffer size to be 8192 pixels */ 3221 + u32 lb_size = 8192; 3222 + 3220 3223 if (!rdev->mode_info.mode_config_initialized) 3221 3224 return; 3222 3225 ··· 3634 3631 DRM_DEBUG_KMS("GRPH2_BUFFER_CNTL from to %x\n", 3635 3632 (unsigned int)RREG32(RADEON_GRPH2_BUFFER_CNTL)); 3636 3633 } 3634 + 3635 + /* Save number of lines the linebuffer leads before the scanout */ 3636 + if (mode1) 3637 + rdev->mode_info.crtcs[0]->lb_vblank_lead_lines = DIV_ROUND_UP(lb_size, mode1->crtc_hdisplay); 3638 + 3639 + if (mode2) 3640 + rdev->mode_info.crtcs[1]->lb_vblank_lead_lines = DIV_ROUND_UP(lb_size, mode2->crtc_hdisplay); 3637 3641 } 3638 3642 3639 3643 int r100_ring_test(struct radeon_device *rdev, struct radeon_ring *ring)
+79 -27
drivers/gpu/drm/radeon/radeon_display.c
··· 322 322 * to complete in this vblank? 323 323 */ 324 324 if (update_pending && 325 - (DRM_SCANOUTPOS_VALID & radeon_get_crtc_scanoutpos(rdev->ddev, crtc_id, 0, 325 + (DRM_SCANOUTPOS_VALID & radeon_get_crtc_scanoutpos(rdev->ddev, 326 + crtc_id, 327 + USE_REAL_VBLANKSTART, 326 328 &vpos, &hpos, NULL, NULL, 327 329 &rdev->mode_info.crtcs[crtc_id]->base.hwmode)) && 328 330 ((vpos >= (99 * rdev->mode_info.crtcs[crtc_id]->base.hwmode.crtc_vdisplay)/100) || ··· 403 401 struct drm_crtc *crtc = &radeon_crtc->base; 404 402 unsigned long flags; 405 403 int r; 404 + int vpos, hpos, stat, min_udelay; 405 + struct drm_vblank_crtc *vblank = &crtc->dev->vblank[work->crtc_id]; 406 406 407 407 down_read(&rdev->exclusive_lock); 408 408 if (work->fence) { ··· 440 436 441 437 /* set the proper interrupt */ 442 438 radeon_irq_kms_pflip_irq_get(rdev, radeon_crtc->crtc_id); 439 + 440 + /* If this happens to execute within the "virtually extended" vblank 441 + * interval before the start of the real vblank interval then it needs 442 + * to delay programming the mmio flip until the real vblank is entered. 443 + * This prevents completing a flip too early due to the way we fudge 444 + * our vblank counter and vblank timestamps in order to work around the 445 + * problem that the hw fires vblank interrupts before actual start of 446 + * vblank (when line buffer refilling is done for a frame). It 447 + * complements the fudging logic in radeon_get_crtc_scanoutpos() for 448 + * timestamping and radeon_get_vblank_counter_kms() for vblank counts. 449 + * 450 + * In practice this won't execute very often unless on very fast 451 + * machines because the time window for this to happen is very small. 452 + */ 453 + for (;;) { 454 + /* GET_DISTANCE_TO_VBLANKSTART returns distance to real vblank 455 + * start in hpos, and to the "fudged earlier" vblank start in 456 + * vpos. 457 + */ 458 + stat = radeon_get_crtc_scanoutpos(rdev->ddev, work->crtc_id, 459 + GET_DISTANCE_TO_VBLANKSTART, 460 + &vpos, &hpos, NULL, NULL, 461 + &crtc->hwmode); 462 + 463 + if ((stat & (DRM_SCANOUTPOS_VALID | DRM_SCANOUTPOS_ACCURATE)) != 464 + (DRM_SCANOUTPOS_VALID | DRM_SCANOUTPOS_ACCURATE) || 465 + !(vpos >= 0 && hpos <= 0)) 466 + break; 467 + 468 + /* Sleep at least until estimated real start of hw vblank */ 469 + spin_unlock_irqrestore(&crtc->dev->event_lock, flags); 470 + min_udelay = (-hpos + 1) * max(vblank->linedur_ns / 1000, 5); 471 + usleep_range(min_udelay, 2 * min_udelay); 472 + spin_lock_irqsave(&crtc->dev->event_lock, flags); 473 + }; 443 474 444 475 /* do the flip (mmio) */ 445 476 radeon_page_flip(rdev, radeon_crtc->crtc_id, work->base); ··· 1807 1768 * \param dev Device to query. 1808 1769 * \param crtc Crtc to query. 1809 1770 * \param flags Flags from caller (DRM_CALLED_FROM_VBLIRQ or 0). 1771 + * For driver internal use only also supports these flags: 1772 + * 1773 + * USE_REAL_VBLANKSTART to use the real start of vblank instead 1774 + * of a fudged earlier start of vblank. 1775 + * 1776 + * GET_DISTANCE_TO_VBLANKSTART to return distance to the 1777 + * fudged earlier start of vblank in *vpos and the distance 1778 + * to true start of vblank in *hpos. 1779 + * 1810 1780 * \param *vpos Location where vertical scanout position should be stored. 1811 1781 * \param *hpos Location where horizontal scanout position should go. 1812 1782 * \param *stime Target location for timestamp taken immediately before ··· 1959 1911 vbl_end = 0; 1960 1912 } 1961 1913 1914 + /* Called from driver internal vblank counter query code? */ 1915 + if (flags & GET_DISTANCE_TO_VBLANKSTART) { 1916 + /* Caller wants distance from real vbl_start in *hpos */ 1917 + *hpos = *vpos - vbl_start; 1918 + } 1919 + 1920 + /* Fudge vblank to start a few scanlines earlier to handle the 1921 + * problem that vblank irqs fire a few scanlines before start 1922 + * of vblank. Some driver internal callers need the true vblank 1923 + * start to be used and signal this via the USE_REAL_VBLANKSTART flag. 1924 + * 1925 + * The cause of the "early" vblank irq is that the irq is triggered 1926 + * by the line buffer logic when the line buffer read position enters 1927 + * the vblank, whereas our crtc scanout position naturally lags the 1928 + * line buffer read position. 1929 + */ 1930 + if (!(flags & USE_REAL_VBLANKSTART)) 1931 + vbl_start -= rdev->mode_info.crtcs[pipe]->lb_vblank_lead_lines; 1932 + 1962 1933 /* Test scanout position against vblank region. */ 1963 1934 if ((*vpos < vbl_start) && (*vpos >= vbl_end)) 1964 1935 in_vbl = false; 1936 + 1937 + /* In vblank? */ 1938 + if (in_vbl) 1939 + ret |= DRM_SCANOUTPOS_IN_VBLANK; 1940 + 1941 + /* Called from driver internal vblank counter query code? */ 1942 + if (flags & GET_DISTANCE_TO_VBLANKSTART) { 1943 + /* Caller wants distance from fudged earlier vbl_start */ 1944 + *vpos -= vbl_start; 1945 + return ret; 1946 + } 1965 1947 1966 1948 /* Check if inside vblank area and apply corrective offsets: 1967 1949 * vpos will then be >=0 in video scanout area, but negative ··· 2007 1929 2008 1930 /* Correct for shifted end of vbl at vbl_end. */ 2009 1931 *vpos = *vpos - vbl_end; 2010 - 2011 - /* In vblank? */ 2012 - if (in_vbl) 2013 - ret |= DRM_SCANOUTPOS_IN_VBLANK; 2014 - 2015 - /* Is vpos outside nominal vblank area, but less than 2016 - * 1/100 of a frame height away from start of vblank? 2017 - * If so, assume this isn't a massively delayed vblank 2018 - * interrupt, but a vblank interrupt that fired a few 2019 - * microseconds before true start of vblank. Compensate 2020 - * by adding a full frame duration to the final timestamp. 2021 - * Happens, e.g., on ATI R500, R600. 2022 - * 2023 - * We only do this if DRM_CALLED_FROM_VBLIRQ. 2024 - */ 2025 - if ((flags & DRM_CALLED_FROM_VBLIRQ) && !in_vbl) { 2026 - vbl_start = mode->crtc_vdisplay; 2027 - vtotal = mode->crtc_vtotal; 2028 - 2029 - if (vbl_start - *vpos < vtotal / 100) { 2030 - *vpos -= vtotal; 2031 - 2032 - /* Signal this correction as "applied". */ 2033 - ret |= 0x8; 2034 - } 2035 - } 2036 1932 2037 1933 return ret; 2038 1934 }
+49 -1
drivers/gpu/drm/radeon/radeon_kms.c
··· 755 755 */ 756 756 u32 radeon_get_vblank_counter_kms(struct drm_device *dev, int crtc) 757 757 { 758 + int vpos, hpos, stat; 759 + u32 count; 758 760 struct radeon_device *rdev = dev->dev_private; 759 761 760 762 if (crtc < 0 || crtc >= rdev->num_crtc) { ··· 764 762 return -EINVAL; 765 763 } 766 764 767 - return radeon_get_vblank_counter(rdev, crtc); 765 + /* The hw increments its frame counter at start of vsync, not at start 766 + * of vblank, as is required by DRM core vblank counter handling. 767 + * Cook the hw count here to make it appear to the caller as if it 768 + * incremented at start of vblank. We measure distance to start of 769 + * vblank in vpos. vpos therefore will be >= 0 between start of vblank 770 + * and start of vsync, so vpos >= 0 means to bump the hw frame counter 771 + * result by 1 to give the proper appearance to caller. 772 + */ 773 + if (rdev->mode_info.crtcs[crtc]) { 774 + /* Repeat readout if needed to provide stable result if 775 + * we cross start of vsync during the queries. 776 + */ 777 + do { 778 + count = radeon_get_vblank_counter(rdev, crtc); 779 + /* Ask radeon_get_crtc_scanoutpos to return vpos as 780 + * distance to start of vblank, instead of regular 781 + * vertical scanout pos. 782 + */ 783 + stat = radeon_get_crtc_scanoutpos( 784 + dev, crtc, GET_DISTANCE_TO_VBLANKSTART, 785 + &vpos, &hpos, NULL, NULL, 786 + &rdev->mode_info.crtcs[crtc]->base.hwmode); 787 + } while (count != radeon_get_vblank_counter(rdev, crtc)); 788 + 789 + if (((stat & (DRM_SCANOUTPOS_VALID | DRM_SCANOUTPOS_ACCURATE)) != 790 + (DRM_SCANOUTPOS_VALID | DRM_SCANOUTPOS_ACCURATE))) { 791 + DRM_DEBUG_VBL("Query failed! stat %d\n", stat); 792 + } 793 + else { 794 + DRM_DEBUG_VBL("crtc %d: dist from vblank start %d\n", 795 + crtc, vpos); 796 + 797 + /* Bump counter if we are at >= leading edge of vblank, 798 + * but before vsync where vpos would turn negative and 799 + * the hw counter really increments. 800 + */ 801 + if (vpos >= 0) 802 + count++; 803 + } 804 + } 805 + else { 806 + /* Fallback to use value as is. */ 807 + count = radeon_get_vblank_counter(rdev, crtc); 808 + DRM_DEBUG_VBL("NULL mode info! Returned count may be wrong.\n"); 809 + } 810 + 811 + return count; 768 812 } 769 813 770 814 /**
+4
drivers/gpu/drm/radeon/radeon_mode.h
··· 367 367 u32 line_time; 368 368 u32 wm_low; 369 369 u32 wm_high; 370 + u32 lb_vblank_lead_lines; 370 371 struct drm_display_mode hw_mode; 371 372 enum radeon_output_csc output_csc; 372 373 }; ··· 687 686 struct atom_voltage_table_entry entries[MAX_VOLTAGE_ENTRIES]; 688 687 }; 689 688 689 + /* Driver internal use only flags of radeon_get_crtc_scanoutpos() */ 690 + #define USE_REAL_VBLANKSTART (1 << 30) 691 + #define GET_DISTANCE_TO_VBLANKSTART (1 << 31) 690 692 691 693 extern void 692 694 radeon_add_atom_connector(struct drm_device *dev,
+3 -1
drivers/gpu/drm/radeon/radeon_pm.c
··· 1756 1756 */ 1757 1757 for (crtc = 0; (crtc < rdev->num_crtc) && in_vbl; crtc++) { 1758 1758 if (rdev->pm.active_crtcs & (1 << crtc)) { 1759 - vbl_status = radeon_get_crtc_scanoutpos(rdev->ddev, crtc, 0, 1759 + vbl_status = radeon_get_crtc_scanoutpos(rdev->ddev, 1760 + crtc, 1761 + USE_REAL_VBLANKSTART, 1760 1762 &vpos, &hpos, NULL, NULL, 1761 1763 &rdev->mode_info.crtcs[crtc]->base.hwmode); 1762 1764 if ((vbl_status & DRM_SCANOUTPOS_VALID) &&
+10
drivers/gpu/drm/radeon/rs690.c
··· 207 207 { 208 208 u32 tmp; 209 209 210 + /* Guess line buffer size to be 8192 pixels */ 211 + u32 lb_size = 8192; 212 + 210 213 /* 211 214 * Line Buffer Setup 212 215 * There is a single line buffer shared by both display controllers. ··· 246 243 tmp |= V_006520_DC_LB_MEMORY_SPLIT_D1_1Q_D2_3Q; 247 244 } 248 245 WREG32(R_006520_DC_LB_MEMORY_SPLIT, tmp); 246 + 247 + /* Save number of lines the linebuffer leads before the scanout */ 248 + if (mode1) 249 + rdev->mode_info.crtcs[0]->lb_vblank_lead_lines = DIV_ROUND_UP(lb_size, mode1->crtc_hdisplay); 250 + 251 + if (mode2) 252 + rdev->mode_info.crtcs[1]->lb_vblank_lead_lines = DIV_ROUND_UP(lb_size, mode2->crtc_hdisplay); 249 253 } 250 254 251 255 struct rs690_watermark {
+3
drivers/gpu/drm/radeon/si.c
··· 2376 2376 c.full = dfixed_div(c, a); 2377 2377 priority_b_mark = dfixed_trunc(c); 2378 2378 priority_b_cnt |= priority_b_mark & PRIORITY_MARK_MASK; 2379 + 2380 + /* Save number of lines the linebuffer leads before the scanout */ 2381 + radeon_crtc->lb_vblank_lead_lines = DIV_ROUND_UP(lb_size, mode->crtc_hdisplay); 2379 2382 } 2380 2383 2381 2384 /* select wm A */