Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled

The MBA software controller (mba_sc) is a feedback loop which
periodically reads MBM counters and tries to restrict the bandwidth
below a user-specified value. It tags along the MBM counter overflow
handler to do the updates with 1s interval in mbm_update() and
update_mba_bw().

The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:

rdtgroup_mondata_show()
mon_event_read()
mon_event_count()
__mon_event_count()

In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.

When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated from incorrect m->chunks is shown to
the user.

Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.

Test steps:
# Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
&& make
./tools/membw/membw -c 0 -b 10000 --read

# Enable MBA software controller
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

# Create control group c1
mkdir /sys/fs/resctrl/c1

# Set MB throttle to 6 GB/s
echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata

# Write PID of the workload to tasks file
echo `pidof membw` > /sys/fs/resctrl/c1/tasks

# Read local bytes counters twice with 1s interval, the calculated
# local bandwidth is not as expected (approaching to 6 GB/s):
local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
sleep 1
local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
echo "local b/w (bytes/s):" `expr $local_2 - $local_1`

Before fix:
local b/w (bytes/s): 11076796416

After fix:
local b/w (bytes/s): 5465014272

Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1607063279-19437-1-git-send-email-xiaochen.shen@intel.com

authored by

Xiaochen Shen and committed by
Borislav Petkov
06c5fe9b 29ac40cb

+2 -4
+2 -4
arch/x86/kernel/cpu/resctrl/monitor.c
··· 279 279 return; 280 280 281 281 chunks = mbm_overflow_count(m->prev_bw_msr, tval, rr->r->mbm_width); 282 - m->chunks += chunks; 283 282 cur_bw = (chunks * r->mon_scale) >> 20; 284 283 285 284 if (m->delta_comp) ··· 449 450 } 450 451 if (is_mbm_local_enabled()) { 451 452 rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID; 453 + __mon_event_count(rmid, &rr); 452 454 453 455 /* 454 456 * Call the MBA software controller only for the 455 457 * control groups and when user has enabled 456 458 * the software controller explicitly. 457 459 */ 458 - if (!is_mba_sc(NULL)) 459 - __mon_event_count(rmid, &rr); 460 - else 460 + if (is_mba_sc(NULL)) 461 461 mbm_bw_count(rmid, &rr); 462 462 } 463 463 }