Merge tag 'pm-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

+16

Documentation/ABI/testing/sysfs-power

··· 454 454 disables it. Reads from the file return the current value. 455 455 The default is "1" if the build-time "SUSPEND_SKIP_SYNC" config 456 456 flag is unset, or "0" otherwise. 457 + 458 + What: /sys/power/hibernate_compression_threads 459 + Date: October 2025 460 + Contact: <luoxueqin@kylinos.cn> 461 + Description: 462 + Controls the number of threads used for compression 463 + and decompression of hibernation images. 464 + 465 + The value can be adjusted at runtime to balance 466 + performance and CPU utilization. 467 + 468 + The change takes effect on the next hibernation or 469 + resume operation. 470 + 471 + Minimum value: 1 472 + Default value: 3

+10

Documentation/admin-guide/kernel-parameters.txt

··· 1907 1907 /sys/power/pm_test). Only available when CONFIG_PM_DEBUG 1908 1908 is set. Default value is 5. 1909 1909 1910 + hibernate_compression_threads= 1911 + [HIBERNATION] 1912 + Set the number of threads used for compressing or decompressing 1913 + hibernation images. 1914 + 1915 + Format: <integer> 1916 + Default: 3 1917 + Minimum: 1 1918 + Example: hibernate_compression_threads=4 1919 + 1910 1920 highmem=nn[KMG] [KNL,BOOT,EARLY] forces the highmem zone to have an exact 1911 1921 size of <nn>. This works even on boxes that have no 1912 1922 highmem otherwise. This also works to reduce highmem

+9

Documentation/admin-guide/pm/cpuidle.rst

··· 580 580 they are allowed to select for that CPU. They should never select any idle 581 581 states with exit latency beyond that limit. 582 582 583 + While the above CPU QoS constraints apply to CPU idle time management, user 584 + space may also request a CPU system wakeup latency QoS limit, via the 585 + `cpu_wakeup_latency` file. This QoS constraint is respected when selecting a 586 + suitable idle state for the CPUs, while entering the system-wide suspend-to-idle 587 + sleep state, but also to the regular CPU idle time management. 588 + 589 + Note that, the management of the `cpu_wakeup_latency` file works according to 590 + the 'cpu_dma_latency' file from user space point of view. Moreover, the unit 591 + is also microseconds. 583 592 584 593 Idle States Control Via Kernel Command Line 585 594 ===========================================

+74 -59

Documentation/admin-guide/pm/intel_pstate.rst

··· 48 48 command line. However, its configuration can be adjusted via ``sysfs`` to a 49 49 great extent. In some configurations it even is possible to unregister it via 50 50 ``sysfs`` which allows another ``CPUFreq`` scaling driver to be loaded and 51 - registered (see `below <status_attr_>`_). 51 + registered (see :ref:`below <status_attr>`). 52 52 53 + .. _operation_modes: 53 54 54 55 Operation Modes 55 56 =============== ··· 62 61 a certain performance scaling algorithm. Which of them will be in effect 63 62 depends on what kernel command line options are used and on the capabilities of 64 63 the processor. 64 + 65 + .. _active_mode: 65 66 66 67 Active Mode 67 68 ----------- ··· 97 94 Namely, if that option is set, the ``performance`` algorithm will be used by 98 95 default, and the other one will be used by default if it is not set. 99 96 97 + .. _active_mode_hwp: 98 + 100 99 Active Mode With HWP 101 100 ~~~~~~~~~~~~~~~~~~~~ 102 101 ··· 128 123 internal P-state selection logic is expected to focus entirely on performance. 129 124 130 125 This will override the EPP/EPB setting coming from the ``sysfs`` interface 131 - (see `Energy vs Performance Hints`_ below). Moreover, any attempts to change 126 + (see :ref:`energy_performance_hints` below). Moreover, any attempts to change 132 127 the EPP/EPB to a value different from 0 ("performance") via ``sysfs`` in this 133 128 configuration will be rejected. 134 129 ··· 196 191 This is the default P-state selection algorithm if the 197 192 :c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option 198 193 is not set. 194 + 195 + .. _passive_mode: 199 196 200 197 Passive Mode 201 198 ------------ ··· 296 289 the entire range of available P-states, including the whole turbo range, to the 297 290 ``CPUFreq`` core and (in the passive mode) to generic scaling governors. This 298 291 generally causes turbo P-states to be set more often when ``intel_pstate`` is 299 - used relative to ACPI-based CPU performance scaling (see `below <acpi-cpufreq_>`_ 300 - for more information). 292 + used relative to ACPI-based CPU performance scaling (see 293 + :ref:`below <acpi-cpufreq>` for more information). 301 294 302 295 Moreover, since ``intel_pstate`` always knows what the real turbo threshold is 303 296 (even if the Configurable TDP feature is enabled in the processor), its 304 - ``no_turbo`` attribute in ``sysfs`` (described `below <no_turbo_attr_>`_) should 297 + ``no_turbo`` attribute in ``sysfs`` (described :ref:`below <no_turbo_attr>`) should 305 298 work as expected in all cases (that is, if set to disable turbo P-states, it 306 299 always should prevent ``intel_pstate`` from using them). 307 300 ··· 314 307 315 308 * The minimum supported P-state. 316 309 317 - * The maximum supported `non-turbo P-state <turbo_>`_. 310 + * The maximum supported :ref:`non-turbo P-state <turbo>`. 318 311 319 312 * Whether or not turbo P-states are supported at all. 320 313 321 - * The maximum supported `one-core turbo P-state <turbo_>`_ (if turbo P-states 322 - are supported). 314 + * The maximum supported :ref:`one-core turbo P-state <turbo>` (if turbo 315 + P-states are supported). 323 316 324 317 * The scaling formula to translate the driver's internal representation 325 318 of P-states into frequencies and the other way around. ··· 407 400 408 401 If ``CONFIG_ENERGY_MODEL`` has been set during kernel configuration and 409 402 ``intel_pstate`` runs on a hybrid processor without SMT, in addition to enabling 410 - `CAS <CAS_>`_ it registers an Energy Model for the processor. This allows the 403 + :ref:`CAS` it registers an Energy Model for the processor. This allows the 411 404 Energy-Aware Scheduling (EAS) support to be enabled in the CPU scheduler if 412 405 ``schedutil`` is used as the ``CPUFreq`` governor which requires ``intel_pstate`` 413 - to operate in the `passive mode <Passive Mode_>`_. 406 + to operate in the :ref:`passive mode <passive_mode>`. 414 407 415 408 The Energy Model registered by ``intel_pstate`` is artificial (that is, it is 416 409 based on abstract cost values and it does not include any real power numbers) ··· 439 432 User Space Interface in ``sysfs`` 440 433 ================================= 441 434 435 + .. _global_attributes: 436 + 442 437 Global Attributes 443 438 ----------------- 444 439 ··· 453 444 454 445 ``max_perf_pct`` 455 446 Maximum P-state the driver is allowed to set in percent of the 456 - maximum supported performance level (the highest supported `turbo 457 - P-state <turbo_>`_). 447 + maximum supported performance level (the highest supported :ref:`turbo 448 + P-state <turbo>`). 458 449 459 450 This attribute will not be exposed if the 460 451 ``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel ··· 462 453 463 454 ``min_perf_pct`` 464 455 Minimum P-state the driver is allowed to set in percent of the 465 - maximum supported performance level (the highest supported `turbo 466 - P-state <turbo_>`_). 456 + maximum supported performance level (the highest supported :ref:`turbo 457 + P-state <turbo>`). 467 458 468 459 This attribute will not be exposed if the 469 460 ``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel ··· 472 463 ``num_pstates`` 473 464 Number of P-states supported by the processor (between 0 and 255 474 465 inclusive) including both turbo and non-turbo P-states (see 475 - `Turbo P-states Support`_). 466 + :ref:`turbo`). 476 467 477 468 This attribute is present only if the value exposed by it is the same 478 469 for all of the CPUs in the system. 479 470 480 471 The value of this attribute is not affected by the ``no_turbo`` 481 - setting described `below <no_turbo_attr_>`_. 472 + setting described :ref:`below <no_turbo_attr>`. 482 473 483 474 This attribute is read-only. 484 475 485 476 ``turbo_pct`` 486 - Ratio of the `turbo range <turbo_>`_ size to the size of the entire 477 + Ratio of the :ref:`turbo range <turbo>` size to the size of the entire 487 478 range of supported P-states, in percent. 488 479 489 480 This attribute is present only if the value exposed by it is the same ··· 495 486 496 487 ``no_turbo`` 497 488 If set (equal to 1), the driver is not allowed to set any turbo P-states 498 - (see `Turbo P-states Support`_). If unset (equal to 0, which is the 489 + (see :ref:`turbo`). If unset (equal to 0, which is the 499 490 default), turbo P-states can be set by the driver. 500 491 [Note that ``intel_pstate`` does not support the general ``boost`` 501 492 attribute (supported by some other scaling drivers) which is replaced ··· 504 495 This attribute does not affect the maximum supported frequency value 505 496 supplied to the ``CPUFreq`` core and exposed via the policy interface, 506 497 but it affects the maximum possible value of per-policy P-state limits 507 - (see `Interpretation of Policy Attributes`_ below for details). 498 + (see :ref:`policy_attributes_interpretation` below for details). 508 499 509 500 ``hwp_dynamic_boost`` 510 501 This attribute is only present if ``intel_pstate`` works in the 511 - `active mode with the HWP feature enabled <Active Mode With HWP_>`_ in 502 + :ref:`active mode with the HWP feature enabled <active_mode_hwp>` in 512 503 the processor. If set (equal to 1), it causes the minimum P-state limit 513 504 to be increased dynamically for a short time whenever a task previously 514 505 waiting on I/O is selected to run on a given logical CPU (the purpose ··· 523 514 Operation mode of the driver: "active", "passive" or "off". 524 515 525 516 "active" 526 - The driver is functional and in the `active mode 527 - <Active Mode_>`_. 517 + The driver is functional and in the :ref:`active mode 518 + <active_mode>`. 528 519 529 520 "passive" 530 - The driver is functional and in the `passive mode 531 - <Passive Mode_>`_. 521 + The driver is functional and in the :ref:`passive mode 522 + <passive_mode>`. 532 523 533 524 "off" 534 525 The driver is not functional (it is not registered as a scaling ··· 556 547 attribute to "1" enables the energy-efficiency optimizations and setting 557 548 to "0" disables them. 558 549 550 + .. _policy_attributes_interpretation: 551 + 559 552 Interpretation of Policy Attributes 560 553 ----------------------------------- 561 554 562 555 The interpretation of some ``CPUFreq`` policy attributes described in 563 556 Documentation/admin-guide/pm/cpufreq.rst is special with ``intel_pstate`` 564 557 as the current scaling driver and it generally depends on the driver's 565 - `operation mode <Operation Modes_>`_. 558 + :ref:`operation mode <operation_modes>`. 566 559 567 560 First of all, the values of the ``cpuinfo_max_freq``, ``cpuinfo_min_freq`` and 568 561 ``scaling_cur_freq`` attributes are produced by applying a processor-specific ··· 573 562 attributes are capped by the frequency corresponding to the maximum P-state that 574 563 the driver is allowed to set. 575 564 576 - If the ``no_turbo`` `global attribute <no_turbo_attr_>`_ is set, the driver is 577 - not allowed to use turbo P-states, so the maximum value of ``scaling_max_freq`` 578 - and ``scaling_min_freq`` is limited to the maximum non-turbo P-state frequency. 565 + If the ``no_turbo`` :ref:`global attribute <no_turbo_attr>` is set, the driver 566 + is not allowed to use turbo P-states, so the maximum value of 567 + ``scaling_max_freq`` and ``scaling_min_freq`` is limited to the maximum 568 + non-turbo P-state frequency. 579 569 Accordingly, setting ``no_turbo`` causes ``scaling_max_freq`` and 580 570 ``scaling_min_freq`` to go down to that value if they were above it before. 581 571 However, the old values of ``scaling_max_freq`` and ``scaling_min_freq`` will be ··· 588 576 which also is the value of ``cpuinfo_max_freq`` in either case. 589 577 590 578 Next, the following policy attributes have special meaning if 591 - ``intel_pstate`` works in the `active mode <Active Mode_>`_: 579 + ``intel_pstate`` works in the :ref:`active mode <active_mode>`: 592 580 593 581 ``scaling_available_governors`` 594 582 List of P-state selection algorithms provided by ``intel_pstate``. ··· 609 597 Shows the base frequency of the CPU. Any frequency above this will be 610 598 in the turbo frequency range. 611 599 612 - The meaning of these attributes in the `passive mode <Passive Mode_>`_ is the 600 + The meaning of these attributes in the :ref:`passive mode <passive_mode>` is the 613 601 same as for other scaling drivers. 614 602 615 603 Additionally, the value of the ``scaling_driver`` attribute for ``intel_pstate`` 616 604 depends on the operation mode of the driver. Namely, it is either 617 - "intel_pstate" (in the `active mode <Active Mode_>`_) or "intel_cpufreq" (in the 618 - `passive mode <Passive Mode_>`_). 605 + "intel_pstate" (in the :ref:`active mode <active_mode>`) or "intel_cpufreq" 606 + (in the :ref:`passive mode <passive_mode>`). 607 + 608 + .. _pstate_limits_coordination: 619 609 620 610 Coordination of P-State Limits 621 611 ------------------------------ 622 612 623 613 ``intel_pstate`` allows P-state limits to be set in two ways: with the help of 624 - the ``max_perf_pct`` and ``min_perf_pct`` `global attributes 625 - <Global Attributes_>`_ or via the ``scaling_max_freq`` and ``scaling_min_freq`` 614 + the ``max_perf_pct`` and ``min_perf_pct`` :ref:`global attributes 615 + <global_attributes>` or via the ``scaling_max_freq`` and ``scaling_min_freq`` 626 616 ``CPUFreq`` policy attributes. The coordination between those limits is based 627 617 on the following rules, regardless of the current operation mode of the driver: 628 618 ··· 646 632 647 633 3. The global and per-policy limits can be set independently. 648 634 649 - In the `active mode with the HWP feature enabled <Active Mode With HWP_>`_, the 635 + In the :ref:`active mode with the HWP feature enabled <active_mode_hwp>`, the 650 636 resulting effective values are written into hardware registers whenever the 651 637 limits change in order to request its internal P-state selection logic to always 652 638 set P-states within these limits. Otherwise, the limits are taken into account 653 - by scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver 654 - every time before setting a new P-state for a CPU. 639 + by scaling governors (in the :ref:`passive mode <passive_mode>`) and by the 640 + driver every time before setting a new P-state for a CPU. 655 641 656 642 Additionally, if the ``intel_pstate=per_cpu_perf_limits`` command line argument 657 643 is passed to the kernel, ``max_perf_pct`` and ``min_perf_pct`` are not exposed 658 644 at all and the only way to set the limits is by using the policy attributes. 659 645 646 + .. _energy_performance_hints: 660 647 661 648 Energy vs Performance Hints 662 649 --------------------------- ··· 717 702 On those systems each ``_PSS`` object returns a list of P-states supported by 718 703 the corresponding CPU which basically is a subset of the P-states range that can 719 704 be used by ``intel_pstate`` on the same system, with one exception: the whole 720 - `turbo range <turbo_>`_ is represented by one item in it (the topmost one). By 721 - convention, the frequency returned by ``_PSS`` for that item is greater by 1 MHz 722 - than the frequency of the highest non-turbo P-state listed by it, but the 705 + :ref:`turbo range <turbo>` is represented by one item in it (the topmost one). 706 + By convention, the frequency returned by ``_PSS`` for that item is greater by 707 + 1 MHz than the frequency of the highest non-turbo P-state listed by it, but the 723 708 corresponding P-state representation (following the hardware specification) 724 709 returned for it matches the maximum supported turbo P-state (or is the 725 710 special value 255 meaning essentially "go as high as you can get"). ··· 745 730 instead. 746 731 747 732 One more issue related to that may appear on systems supporting the 748 - `Configurable TDP feature <turbo_>`_ allowing the platform firmware to set the 749 - turbo threshold. Namely, if that is not coordinated with the lists of P-states 750 - returned by ``_PSS`` properly, there may be more than one item corresponding to 751 - a turbo P-state in those lists and there may be a problem with avoiding the 752 - turbo range (if desirable or necessary). Usually, to avoid using turbo 753 - P-states overall, ``acpi-cpufreq`` simply avoids using the topmost state listed 754 - by ``_PSS``, but that is not sufficient when there are other turbo P-states in 755 - the list returned by it. 733 + :ref:`Configurable TDP feature <turbo>` allowing the platform firmware to set 734 + the turbo threshold. Namely, if that is not coordinated with the lists of 735 + P-states returned by ``_PSS`` properly, there may be more than one item 736 + corresponding to a turbo P-state in those lists and there may be a problem with 737 + avoiding the turbo range (if desirable or necessary). Usually, to avoid using 738 + turbo P-states overall, ``acpi-cpufreq`` simply avoids using the topmost state 739 + listed by ``_PSS``, but that is not sufficient when there are other turbo 740 + P-states in the list returned by it. 756 741 757 742 Apart from the above, ``acpi-cpufreq`` works like ``intel_pstate`` in the 758 - `passive mode <Passive Mode_>`_, except that the number of P-states it can set 759 - is limited to the ones listed by the ACPI ``_PSS`` objects. 743 + :ref:`passive mode <passive_mode>`, except that the number of P-states it can 744 + set is limited to the ones listed by the ACPI ``_PSS`` objects. 760 745 761 746 762 747 Kernel Command Line Options for ``intel_pstate`` ··· 771 756 processor is supported by it. 772 757 773 758 ``active`` 774 - Register ``intel_pstate`` in the `active mode <Active Mode_>`_ to start 775 - with. 759 + Register ``intel_pstate`` in the :ref:`active mode <active_mode>` to 760 + start with. 776 761 777 762 ``passive`` 778 - Register ``intel_pstate`` in the `passive mode <Passive Mode_>`_ to 763 + Register ``intel_pstate`` in the :ref:`passive mode <passive_mode>` to 779 764 start with. 780 765 781 766 ``force`` ··· 808 793 and this option has no effect. 809 794 810 795 ``per_cpu_perf_limits`` 811 - Use per-logical-CPU P-State limits (see `Coordination of P-state 812 - Limits`_ for details). 796 + Use per-logical-CPU P-State limits (see 797 + :ref:`pstate_limits_coordination` for details). 813 798 814 799 ``no_cas`` 815 - Do not enable `capacity-aware scheduling <CAS_>`_ which is enabled by 816 - default on hybrid systems without SMT. 800 + Do not enable :ref:`capacity-aware scheduling <CAS>` which is enabled 801 + by default on hybrid systems without SMT. 817 802 818 803 Diagnostics and Tuning 819 804 ====================== ··· 825 810 diagnostics. One of them is the ``cpu_frequency`` trace event generally used 826 811 by ``CPUFreq``, and the other one is the ``pstate_sample`` trace event specific 827 812 to ``intel_pstate``. Both of them are triggered by ``intel_pstate`` only if 828 - it works in the `active mode <Active Mode_>`_. 813 + it works in the :ref:`active mode <active_mode>`. 829 814 830 815 The following sequence of shell commands can be used to enable them and see 831 816 their output (if the kernel is generally configured to support event tracing):: ··· 837 822 gnome-terminal--4510 [001] ..s. 1177.680733: pstate_sample: core_busy=107 scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618 freq=2474476 838 823 cat-5235 [002] ..s. 1177.681723: cpu_frequency: state=2900000 cpu_id=2 839 824 840 - If ``intel_pstate`` works in the `passive mode <Passive Mode_>`_, the 825 + If ``intel_pstate`` works in the :ref:`passive mode <passive_mode>`, the 841 826 ``cpu_frequency`` trace event will be triggered either by the ``schedutil`` 842 827 scaling governor (for the policies it is attached to), or by the ``CPUFreq`` 843 828 core (for the policies with other scaling governors).

+113

Documentation/netlink/specs/em.yaml

··· 1 + # SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 2 + 3 + name: em 4 + 5 + doc: | 6 + Energy model netlink interface to notify its changes. 7 + 8 + protocol: genetlink 9 + 10 + uapi-header: linux/energy_model.h 11 + 12 + attribute-sets: 13 + - 14 + name: pds 15 + attributes: 16 + - 17 + name: pd 18 + type: nest 19 + nested-attributes: pd 20 + multi-attr: true 21 + - 22 + name: pd 23 + attributes: 24 + - 25 + name: pad 26 + type: pad 27 + - 28 + name: pd-id 29 + type: u32 30 + - 31 + name: flags 32 + type: u64 33 + - 34 + name: cpus 35 + type: string 36 + - 37 + name: pd-table 38 + attributes: 39 + - 40 + name: pd-id 41 + type: u32 42 + - 43 + name: ps 44 + type: nest 45 + nested-attributes: ps 46 + multi-attr: true 47 + - 48 + name: ps 49 + attributes: 50 + - 51 + name: pad 52 + type: pad 53 + - 54 + name: performance 55 + type: u64 56 + - 57 + name: frequency 58 + type: u64 59 + - 60 + name: power 61 + type: u64 62 + - 63 + name: cost 64 + type: u64 65 + - 66 + name: flags 67 + type: u64 68 + 69 + operations: 70 + list: 71 + - 72 + name: get-pds 73 + attribute-set: pds 74 + doc: Get the list of information for all performance domains. 75 + do: 76 + reply: 77 + attributes: 78 + - pd 79 + - 80 + name: get-pd-table 81 + attribute-set: pd-table 82 + doc: Get the energy model table of a performance domain. 83 + do: 84 + request: 85 + attributes: 86 + - pd-id 87 + reply: 88 + attributes: 89 + - pd-id 90 + - ps 91 + - 92 + name: pd-created 93 + doc: A performance domain is created. 94 + notify: get-pd-table 95 + mcgrp: event 96 + - 97 + name: pd-updated 98 + doc: A performance domain is updated. 99 + notify: get-pd-table 100 + mcgrp: event 101 + - 102 + name: pd-deleted 103 + doc: A performance domain is deleted. 104 + attribute-set: pd-table 105 + event: 106 + attributes: 107 + - pd-id 108 + mcgrp: event 109 + 110 + mcast-groups: 111 + list: 112 + - 113 + name: event

+1

Documentation/power/index.rst

··· 19 19 power_supply_class 20 20 runtime_pm 21 21 s2ram 22 + shutdown-debugging 22 23 suspend-and-cpuhotplug 23 24 suspend-and-interrupts 24 25 swsusp-and-swap-files

+5 -4

Documentation/power/pm_qos_interface.rst

··· 55 55 56 56 From user space: 57 57 58 - The infrastructure exposes one device node, /dev/cpu_dma_latency, for the CPU 58 + The infrastructure exposes two separate device nodes, /dev/cpu_dma_latency for 59 + the CPU latency QoS and /dev/cpu_wakeup_latency for the CPU system wakeup 59 60 latency QoS. 60 61 61 62 Only processes can register a PM QoS request. To provide for automatic ··· 64 63 parameter requests as follows. 65 64 66 65 To register the default PM QoS target for the CPU latency QoS, the process must 67 - open /dev/cpu_dma_latency. 66 + open /dev/cpu_dma_latency. To register a CPU system wakeup QoS limit, the 67 + process must open /dev/cpu_wakeup_latency. 68 68 69 69 As long as the device node is held open that process has a registered 70 70 request on the parameter. 71 71 72 72 To change the requested target value, the process needs to write an s32 value to 73 73 the open device node. Alternatively, it can write a hex string for the value 74 - using the 10 char long format e.g. "0x12345678". This translates to a 75 - cpu_latency_qos_update_request() call. 74 + using the 10 char long format e.g. "0x12345678". 76 75 77 76 To remove the user mode request for a target value simply close the device 78 77 node.

-10

Documentation/power/runtime_pm.rst

··· 480 480 `bool pm_runtime_status_suspended(struct device *dev);` 481 481 - return true if the device's runtime PM status is 'suspended' 482 482 483 - `void pm_runtime_allow(struct device *dev);` 484 - - set the power.runtime_auto flag for the device and decrease its usage 485 - counter (used by the /sys/devices/.../power/control interface to 486 - effectively allow the device to be power managed at run time) 487 - 488 - `void pm_runtime_forbid(struct device *dev);` 489 - - unset the power.runtime_auto flag for the device and increase its usage 490 - counter (used by the /sys/devices/.../power/control interface to 491 - effectively prevent the device from being power managed at run time) 492 - 493 483 `void pm_runtime_no_callbacks(struct device *dev);` 494 484 - set the power.no_callbacks flag for the device and remove the runtime 495 485 PM attributes from /sys/devices/.../power (or prevent them from being

+53

Documentation/power/shutdown-debugging.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + Debugging Kernel Shutdown Hangs with pstore 4 + +++++++++++++++++++++++++++++++++++++++++++ 5 + 6 + Overview 7 + ======== 8 + If the system hangs while shutting down, the kernel logs may need to be 9 + retrieved to debug the issue. 10 + 11 + On systems that have a UART available, it is best to configure the kernel to use 12 + this UART for kernel console output. 13 + 14 + If a UART isn't available, the ``pstore`` subsystem provides a mechanism to 15 + persist this data across a system reset, allowing it to be retrieved on the next 16 + boot. 17 + 18 + Kernel Configuration 19 + ==================== 20 + To enable ``pstore`` and enable saving kernel ring buffer logs, set the 21 + following kernel configuration options: 22 + 23 + * ``CONFIG_PSTORE=y`` 24 + * ``CONFIG_PSTORE_CONSOLE=y`` 25 + 26 + Additionally, enable a backend to store the data. Depending upon your platform 27 + some potential options include: 28 + 29 + * ``CONFIG_EFI_VARS_PSTORE=y`` 30 + * ``CONFIG_PSTORE_RAM=y`` 31 + * ``CONFIG_CHROMEOS_PSTORE=y`` 32 + * ``CONFIG_PSTORE_BLK=y`` 33 + 34 + Kernel Command-line Parameters 35 + ============================== 36 + Add these parameters to your kernel command line: 37 + 38 + * ``printk.always_kmsg_dump=Y`` 39 + * Forces the kernel to dump the entire message buffer to pstore during 40 + shutdown 41 + * ``efi_pstore.pstore_disable=N`` 42 + * For EFI-based systems, ensures the EFI backend is active 43 + 44 + Userspace Interaction and Log Retrieval 45 + ======================================= 46 + On the next boot after a hang, pstore logs will be available in the pstore 47 + filesystem (``/sys/fs/pstore``) and can be retrieved by userspace. 48 + 49 + On systemd systems, the ``systemd-pstore`` service will help do the following: 50 + 51 + #. Locate pstore data in ``/sys/fs/pstore`` 52 + #. Read and save it to ``/var/lib/systemd/pstore`` 53 + #. Clear pstore data for the next event

+3

MAINTAINERS

··· 9188 9188 F: kernel/power/energy_model.c 9189 9189 F: include/linux/energy_model.h 9190 9190 F: Documentation/power/energy-model.rst 9191 + F: Documentation/netlink/specs/em.yaml 9192 + F: include/uapi/linux/energy_model.h 9193 + F: kernel/power/em_netlink*.* 9191 9194 9192 9195 EPAPR HYPERVISOR BYTE CHANNEL DEVICE DRIVER 9193 9196 M: Laurentiu Tudor <laurentiu.tudor@nxp.com>

+12 -12

drivers/acpi/acpi_tad.c

··· 90 90 args[0].buffer.pointer = (u8 *)rt; 91 91 args[0].buffer.length = sizeof(*rt); 92 92 93 - ACQUIRE(pm_runtime_active_try, pm)(dev); 94 - if (ACQUIRE_ERR(pm_runtime_active_try, &pm)) 93 + PM_RUNTIME_ACQUIRE(dev, pm); 94 + if (PM_RUNTIME_ACQUIRE_ERR(&pm)) 95 95 return -ENXIO; 96 96 97 97 status = acpi_evaluate_integer(handle, "_SRT", &arg_list, &retval); ··· 137 137 { 138 138 int ret; 139 139 140 - ACQUIRE(pm_runtime_active_try, pm)(dev); 141 - if (ACQUIRE_ERR(pm_runtime_active_try, &pm)) 140 + PM_RUNTIME_ACQUIRE(dev, pm); 141 + if (PM_RUNTIME_ACQUIRE_ERR(&pm)) 142 142 return -ENXIO; 143 143 144 144 ret = acpi_tad_evaluate_grt(dev, rt); ··· 275 275 args[0].integer.value = timer_id; 276 276 args[1].integer.value = value; 277 277 278 - ACQUIRE(pm_runtime_active_try, pm)(dev); 279 - if (ACQUIRE_ERR(pm_runtime_active_try, &pm)) 278 + PM_RUNTIME_ACQUIRE(dev, pm); 279 + if (PM_RUNTIME_ACQUIRE_ERR(&pm)) 280 280 return -ENXIO; 281 281 282 282 status = acpi_evaluate_integer(handle, method, &arg_list, &retval); ··· 322 322 323 323 args[0].integer.value = timer_id; 324 324 325 - ACQUIRE(pm_runtime_active_try, pm)(dev); 326 - if (ACQUIRE_ERR(pm_runtime_active_try, &pm)) 325 + PM_RUNTIME_ACQUIRE(dev, pm); 326 + if (PM_RUNTIME_ACQUIRE_ERR(&pm)) 327 327 return -ENXIO; 328 328 329 329 status = acpi_evaluate_integer(handle, method, &arg_list, &retval); ··· 377 377 378 378 args[0].integer.value = timer_id; 379 379 380 - ACQUIRE(pm_runtime_active_try, pm)(dev); 381 - if (ACQUIRE_ERR(pm_runtime_active_try, &pm)) 380 + PM_RUNTIME_ACQUIRE(dev, pm); 381 + if (PM_RUNTIME_ACQUIRE_ERR(&pm)) 382 382 return -ENXIO; 383 383 384 384 status = acpi_evaluate_integer(handle, "_CWS", &arg_list, &retval); ··· 417 417 418 418 args[0].integer.value = timer_id; 419 419 420 - ACQUIRE(pm_runtime_active_try, pm)(dev); 421 - if (ACQUIRE_ERR(pm_runtime_active_try, &pm)) 420 + PM_RUNTIME_ACQUIRE(dev, pm); 421 + if (PM_RUNTIME_ACQUIRE_ERR(&pm)) 422 422 return -ENXIO; 423 423 424 424 status = acpi_evaluate_integer(handle, "_GWS", &arg_list, &retval);

+25 -60

drivers/base/power/generic_ops.c

··· 8 8 #include <linux/pm_runtime.h> 9 9 #include <linux/export.h> 10 10 11 + #define CALL_PM_OP(dev, op) \ 12 + ({ \ 13 + struct device *_dev = (dev); \ 14 + const struct dev_pm_ops *pm = _dev->driver ? _dev->driver->pm : NULL; \ 15 + pm && pm->op ? pm->op(_dev) : 0; \ 16 + }) 17 + 11 18 #ifdef CONFIG_PM 12 19 /** 13 20 * pm_generic_runtime_suspend - Generic runtime suspend callback for subsystems. ··· 26 19 */ 27 20 int pm_generic_runtime_suspend(struct device *dev) 28 21 { 29 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 30 - int ret; 31 - 32 - ret = pm && pm->runtime_suspend ? pm->runtime_suspend(dev) : 0; 33 - 34 - return ret; 22 + return CALL_PM_OP(dev, runtime_suspend); 35 23 } 36 24 EXPORT_SYMBOL_GPL(pm_generic_runtime_suspend); 37 25 ··· 40 38 */ 41 39 int pm_generic_runtime_resume(struct device *dev) 42 40 { 43 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 44 - int ret; 45 - 46 - ret = pm && pm->runtime_resume ? pm->runtime_resume(dev) : 0; 47 - 48 - return ret; 41 + return CALL_PM_OP(dev, runtime_resume); 49 42 } 50 43 EXPORT_SYMBOL_GPL(pm_generic_runtime_resume); 51 44 #endif /* CONFIG_PM */ ··· 69 72 */ 70 73 int pm_generic_suspend_noirq(struct device *dev) 71 74 { 72 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 73 - 74 - return pm && pm->suspend_noirq ? pm->suspend_noirq(dev) : 0; 75 + return CALL_PM_OP(dev, suspend_noirq); 75 76 } 76 77 EXPORT_SYMBOL_GPL(pm_generic_suspend_noirq); 77 78 ··· 79 84 */ 80 85 int pm_generic_suspend_late(struct device *dev) 81 86 { 82 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 83 - 84 - return pm && pm->suspend_late ? pm->suspend_late(dev) : 0; 87 + return CALL_PM_OP(dev, suspend_late); 85 88 } 86 89 EXPORT_SYMBOL_GPL(pm_generic_suspend_late); 87 90 ··· 89 96 */ 90 97 int pm_generic_suspend(struct device *dev) 91 98 { 92 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 93 - 94 - return pm && pm->suspend ? pm->suspend(dev) : 0; 99 + return CALL_PM_OP(dev, suspend); 95 100 } 96 101 EXPORT_SYMBOL_GPL(pm_generic_suspend); 97 102 ··· 99 108 */ 100 109 int pm_generic_freeze_noirq(struct device *dev) 101 110 { 102 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 103 - 104 - return pm && pm->freeze_noirq ? pm->freeze_noirq(dev) : 0; 111 + return CALL_PM_OP(dev, freeze_noirq); 105 112 } 106 113 EXPORT_SYMBOL_GPL(pm_generic_freeze_noirq); 107 114 ··· 109 120 */ 110 121 int pm_generic_freeze(struct device *dev) 111 122 { 112 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 113 - 114 - return pm && pm->freeze ? pm->freeze(dev) : 0; 123 + return CALL_PM_OP(dev, freeze); 115 124 } 116 125 EXPORT_SYMBOL_GPL(pm_generic_freeze); 117 126 ··· 119 132 */ 120 133 int pm_generic_poweroff_noirq(struct device *dev) 121 134 { 122 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 123 - 124 - return pm && pm->poweroff_noirq ? pm->poweroff_noirq(dev) : 0; 135 + return CALL_PM_OP(dev, poweroff_noirq); 125 136 } 126 137 EXPORT_SYMBOL_GPL(pm_generic_poweroff_noirq); 127 138 ··· 129 144 */ 130 145 int pm_generic_poweroff_late(struct device *dev) 131 146 { 132 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 133 - 134 - return pm && pm->poweroff_late ? pm->poweroff_late(dev) : 0; 147 + return CALL_PM_OP(dev, poweroff_late); 135 148 } 136 149 EXPORT_SYMBOL_GPL(pm_generic_poweroff_late); 137 150 ··· 139 156 */ 140 157 int pm_generic_poweroff(struct device *dev) 141 158 { 142 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 143 - 144 - return pm && pm->poweroff ? pm->poweroff(dev) : 0; 159 + return CALL_PM_OP(dev, poweroff); 145 160 } 146 161 EXPORT_SYMBOL_GPL(pm_generic_poweroff); 147 162 ··· 149 168 */ 150 169 int pm_generic_thaw_noirq(struct device *dev) 151 170 { 152 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 153 - 154 - return pm && pm->thaw_noirq ? pm->thaw_noirq(dev) : 0; 171 + return CALL_PM_OP(dev, thaw_noirq); 155 172 } 156 173 EXPORT_SYMBOL_GPL(pm_generic_thaw_noirq); 157 174 ··· 159 180 */ 160 181 int pm_generic_thaw(struct device *dev) 161 182 { 162 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 163 - 164 - return pm && pm->thaw ? pm->thaw(dev) : 0; 183 + return CALL_PM_OP(dev, thaw); 165 184 } 166 185 EXPORT_SYMBOL_GPL(pm_generic_thaw); 167 186 ··· 169 192 */ 170 193 int pm_generic_resume_noirq(struct device *dev) 171 194 { 172 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 173 - 174 - return pm && pm->resume_noirq ? pm->resume_noirq(dev) : 0; 195 + return CALL_PM_OP(dev, resume_noirq); 175 196 } 176 197 EXPORT_SYMBOL_GPL(pm_generic_resume_noirq); 177 198 ··· 179 204 */ 180 205 int pm_generic_resume_early(struct device *dev) 181 206 { 182 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 183 - 184 - return pm && pm->resume_early ? pm->resume_early(dev) : 0; 207 + return CALL_PM_OP(dev, resume_early); 185 208 } 186 209 EXPORT_SYMBOL_GPL(pm_generic_resume_early); 187 210 ··· 189 216 */ 190 217 int pm_generic_resume(struct device *dev) 191 218 { 192 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 193 - 194 - return pm && pm->resume ? pm->resume(dev) : 0; 219 + return CALL_PM_OP(dev, resume); 195 220 } 196 221 EXPORT_SYMBOL_GPL(pm_generic_resume); 197 222 ··· 199 228 */ 200 229 int pm_generic_restore_noirq(struct device *dev) 201 230 { 202 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 203 - 204 - return pm && pm->restore_noirq ? pm->restore_noirq(dev) : 0; 231 + return CALL_PM_OP(dev, restore_noirq); 205 232 } 206 233 EXPORT_SYMBOL_GPL(pm_generic_restore_noirq); 207 234 ··· 209 240 */ 210 241 int pm_generic_restore_early(struct device *dev) 211 242 { 212 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 213 - 214 - return pm && pm->restore_early ? pm->restore_early(dev) : 0; 243 + return CALL_PM_OP(dev, restore_early); 215 244 } 216 245 EXPORT_SYMBOL_GPL(pm_generic_restore_early); 217 246 ··· 219 252 */ 220 253 int pm_generic_restore(struct device *dev) 221 254 { 222 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 223 - 224 - return pm && pm->restore ? pm->restore(dev) : 0; 255 + return CALL_PM_OP(dev, restore); 225 256 } 226 257 EXPORT_SYMBOL_GPL(pm_generic_restore); 227 258

+15

drivers/base/power/main.c

··· 34 34 #include <linux/cpufreq.h> 35 35 #include <linux/devfreq.h> 36 36 #include <linux/timer.h> 37 + #include <linux/nmi.h> 37 38 38 39 #include "../base.h" 39 40 #include "power.h" ··· 96 95 return "restore"; 97 96 case PM_EVENT_RECOVER: 98 97 return "recover"; 98 + case PM_EVENT_POWEROFF: 99 + return "poweroff"; 99 100 default: 100 101 return "(unknown PM event)"; 101 102 } ··· 370 367 case PM_EVENT_FREEZE: 371 368 case PM_EVENT_QUIESCE: 372 369 return ops->freeze; 370 + case PM_EVENT_POWEROFF: 373 371 case PM_EVENT_HIBERNATE: 374 372 return ops->poweroff; 375 373 case PM_EVENT_THAW: ··· 405 401 case PM_EVENT_FREEZE: 406 402 case PM_EVENT_QUIESCE: 407 403 return ops->freeze_late; 404 + case PM_EVENT_POWEROFF: 408 405 case PM_EVENT_HIBERNATE: 409 406 return ops->poweroff_late; 410 407 case PM_EVENT_THAW: ··· 440 435 case PM_EVENT_FREEZE: 441 436 case PM_EVENT_QUIESCE: 442 437 return ops->freeze_noirq; 438 + case PM_EVENT_POWEROFF: 443 439 case PM_EVENT_HIBERNATE: 444 440 return ops->poweroff_noirq; 445 441 case PM_EVENT_THAW: ··· 521 515 #define DECLARE_DPM_WATCHDOG_ON_STACK(wd) \ 522 516 struct dpm_watchdog wd 523 517 518 + static bool __read_mostly dpm_watchdog_all_cpu_backtrace; 519 + module_param(dpm_watchdog_all_cpu_backtrace, bool, 0644); 520 + MODULE_PARM_DESC(dpm_watchdog_all_cpu_backtrace, 521 + "Backtrace all CPUs on DPM watchdog timeout"); 522 + 524 523 /** 525 524 * dpm_watchdog_handler - Driver suspend / resume watchdog handler. 526 525 * @t: The timer that PM watchdog depends on. ··· 541 530 unsigned int time_left; 542 531 543 532 if (wd->fatal) { 533 + unsigned int this_cpu = smp_processor_id(); 534 + 544 535 dev_emerg(wd->dev, "**** DPM device timeout ****\n"); 545 536 show_stack(wd->tsk, NULL, KERN_EMERG); 537 + if (dpm_watchdog_all_cpu_backtrace) 538 + trigger_allbutcpu_cpu_backtrace(this_cpu); 546 539 panic("%s %s: unrecoverable failure\n", 547 540 dev_driver_string(wd->dev), dev_name(wd->dev)); 548 541 }

+16 -7

drivers/base/power/runtime.c

··· 90 90 /* 91 91 * Because ktime_get_mono_fast_ns() is not monotonic during 92 92 * timekeeping updates, ensure that 'now' is after the last saved 93 - * timesptamp. 93 + * timestamp. 94 94 */ 95 95 if (now < last) 96 96 return; ··· 217 217 * resume/suspend callback of any one of its ancestors(or the 218 218 * block device itself), the deadlock may be triggered inside the 219 219 * memory allocation since it might not complete until the block 220 - * device becomes active and the involed page I/O finishes. The 220 + * device becomes active and the involved page I/O finishes. The 221 221 * situation is pointed out first by Alan Stern. Network device 222 222 * are involved in iSCSI kind of situation. 223 223 * ··· 1210 1210 * 1211 1211 * Otherwise, if its runtime PM status is %RPM_ACTIVE and (1) @ign_usage_count 1212 1212 * is set, or (2) @dev is not ignoring children and its active child count is 1213 - * nonero, or (3) the runtime PM usage counter of @dev is not zero, increment 1213 + * nonzero, or (3) the runtime PM usage counter of @dev is not zero, increment 1214 1214 * the usage counter of @dev and return 1. 1215 1215 * 1216 1216 * Otherwise, return 0 without changing the usage counter. ··· 1664 1664 * pm_runtime_forbid - Block runtime PM of a device. 1665 1665 * @dev: Device to handle. 1666 1666 * 1667 - * Increase the device's usage count and clear its power.runtime_auto flag, 1668 - * so that it cannot be suspended at run time until pm_runtime_allow() is called 1669 - * for it. 1667 + * Resume @dev if already suspended and block runtime suspend of @dev in such 1668 + * a way that it can be unblocked via the /sys/devices/.../power/control 1669 + * interface, or otherwise by calling pm_runtime_allow(). 1670 + * 1671 + * Calling this function many times in a row has the same effect as calling it 1672 + * once. 1670 1673 */ 1671 1674 void pm_runtime_forbid(struct device *dev) 1672 1675 { ··· 1690 1687 * pm_runtime_allow - Unblock runtime PM of a device. 1691 1688 * @dev: Device to handle. 1692 1689 * 1693 - * Decrease the device's usage count and set its power.runtime_auto flag. 1690 + * Unblock runtime suspend of @dev after it has been blocked by 1691 + * pm_runtime_forbid() (for instance, if it has been blocked via the 1692 + * /sys/devices/.../power/control interface), check if @dev can be 1693 + * suspended and suspend it in that case. 1694 + * 1695 + * Calling this function many times in a row has the same effect as calling it 1696 + * once. 1694 1697 */ 1695 1698 void pm_runtime_allow(struct device *dev) 1696 1699 {

+1 -3

drivers/base/power/trace.c

··· 238 238 unsigned int hash = hash_string(DEVSEED, dev_name(dev), 239 239 DEVHASH); 240 240 if (hash == value) { 241 - int len = snprintf(buf, size, "%s\n", 241 + int len = scnprintf(buf, size, "%s\n", 242 242 dev_driver_string(dev)); 243 - if (len > size) 244 - len = size; 245 243 buf += len; 246 244 ret += len; 247 245 size -= len;

+11 -13

drivers/base/power/wakeup.c

··· 189 189 if (WARN_ON(!ws)) 190 190 return; 191 191 192 + /* 193 + * After shutting down the timer, wakeup_source_activate() will warn if 194 + * the given wakeup source is passed to it. 195 + */ 196 + timer_shutdown_sync(&ws->timer); 197 + 192 198 raw_spin_lock_irqsave(&events_lock, flags); 193 199 list_del_rcu(&ws->entry); 194 200 raw_spin_unlock_irqrestore(&events_lock, flags); 195 201 synchronize_srcu(&wakeup_srcu); 196 - 197 - timer_delete_sync(&ws->timer); 198 - /* 199 - * Clear timer.function to make wakeup_source_not_registered() treat 200 - * this wakeup source as not registered. 201 - */ 202 - ws->timer.function = NULL; 203 202 } 204 203 205 204 /** ··· 505 506 EXPORT_SYMBOL_GPL(device_set_wakeup_enable); 506 507 507 508 /** 508 - * wakeup_source_not_registered - validate the given wakeup source. 509 + * wakeup_source_not_usable - validate the given wakeup source. 509 510 * @ws: Wakeup source to be validated. 510 511 */ 511 - static bool wakeup_source_not_registered(struct wakeup_source *ws) 512 + static bool wakeup_source_not_usable(struct wakeup_source *ws) 512 513 { 513 514 /* 514 - * Use timer struct to check if the given source is initialized 515 - * by wakeup_source_add. 515 + * Use the timer struct to check if the given wakeup source has been 516 + * initialized by wakeup_source_add() and it is not going away. 516 517 */ 517 518 return ws->timer.function != pm_wakeup_timer_fn; 518 519 } ··· 557 558 { 558 559 unsigned int cec; 559 560 560 - if (WARN_ONCE(wakeup_source_not_registered(ws), 561 - "unregistered wakeup source\n")) 561 + if (WARN_ONCE(wakeup_source_not_usable(ws), "unusable wakeup source\n")) 562 562 return; 563 563 564 564 ws->active = true;

+1 -1

drivers/cpufreq/acpi-cpufreq.c

··· 395 395 cur_freq = extract_freq(policy, get_cur_val(mask, data)); 396 396 if (cur_freq == freq) 397 397 return 1; 398 - udelay(10); 398 + usleep_range(10, 15); 399 399 } 400 400 return 0; 401 401 }

+15 -20

drivers/cpufreq/amd-pstate.c

··· 65 65 [AMD_PSTATE_PASSIVE] = "passive", 66 66 [AMD_PSTATE_ACTIVE] = "active", 67 67 [AMD_PSTATE_GUIDED] = "guided", 68 - NULL, 69 68 }; 69 + static_assert(ARRAY_SIZE(amd_pstate_mode_string) == AMD_PSTATE_MAX); 70 70 71 71 const char *amd_pstate_get_mode_string(enum amd_pstate_mode mode) 72 72 { 73 - if (mode < 0 || mode >= AMD_PSTATE_MAX) 74 - return NULL; 73 + if (mode < AMD_PSTATE_UNDEFINED || mode >= AMD_PSTATE_MAX) 74 + mode = AMD_PSTATE_UNDEFINED; 75 75 return amd_pstate_mode_string[mode]; 76 76 } 77 77 EXPORT_SYMBOL_GPL(amd_pstate_get_mode_string); ··· 110 110 EPP_INDEX_BALANCE_PERFORMANCE, 111 111 EPP_INDEX_BALANCE_POWERSAVE, 112 112 EPP_INDEX_POWERSAVE, 113 + EPP_INDEX_MAX, 113 114 }; 114 115 115 116 static const char * const energy_perf_strings[] = { ··· 119 118 [EPP_INDEX_BALANCE_PERFORMANCE] = "balance_performance", 120 119 [EPP_INDEX_BALANCE_POWERSAVE] = "balance_power", 121 120 [EPP_INDEX_POWERSAVE] = "power", 122 - NULL 123 121 }; 122 + static_assert(ARRAY_SIZE(energy_perf_strings) == EPP_INDEX_MAX); 124 123 125 124 static unsigned int epp_values[] = { 126 125 [EPP_INDEX_DEFAULT] = 0, ··· 128 127 [EPP_INDEX_BALANCE_PERFORMANCE] = AMD_CPPC_EPP_BALANCE_PERFORMANCE, 129 128 [EPP_INDEX_BALANCE_POWERSAVE] = AMD_CPPC_EPP_BALANCE_POWERSAVE, 130 129 [EPP_INDEX_POWERSAVE] = AMD_CPPC_EPP_POWERSAVE, 131 - }; 130 + }; 131 + static_assert(ARRAY_SIZE(epp_values) == EPP_INDEX_MAX); 132 132 133 133 typedef int (*cppc_mode_transition_fn)(int); 134 134 ··· 185 183 { 186 184 int i; 187 185 188 - for (i=0; i < AMD_PSTATE_MAX; i++) { 186 + for (i = 0; i < AMD_PSTATE_MAX; i++) { 189 187 if (!strncmp(str, amd_pstate_mode_string[i], size)) 190 188 return i; 191 189 } ··· 1139 1137 static ssize_t show_energy_performance_available_preferences( 1140 1138 struct cpufreq_policy *policy, char *buf) 1141 1139 { 1142 - int i = 0; 1143 - int offset = 0; 1140 + int offset = 0, i; 1144 1141 struct amd_cpudata *cpudata = policy->driver_data; 1145 1142 1146 1143 if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) 1147 1144 return sysfs_emit_at(buf, offset, "%s\n", 1148 1145 energy_perf_strings[EPP_INDEX_PERFORMANCE]); 1149 1146 1150 - while (energy_perf_strings[i] != NULL) 1151 - offset += sysfs_emit_at(buf, offset, "%s ", energy_perf_strings[i++]); 1147 + for (i = 0; i < ARRAY_SIZE(energy_perf_strings); i++) 1148 + offset += sysfs_emit_at(buf, offset, "%s ", energy_perf_strings[i]); 1152 1149 1153 1150 offset += sysfs_emit_at(buf, offset, "\n"); 1154 1151 ··· 1158 1157 struct cpufreq_policy *policy, const char *buf, size_t count) 1159 1158 { 1160 1159 struct amd_cpudata *cpudata = policy->driver_data; 1161 - char str_preference[21]; 1162 1160 ssize_t ret; 1163 1161 u8 epp; 1164 1162 1165 - ret = sscanf(buf, "%20s", str_preference); 1166 - if (ret != 1) 1167 - return -EINVAL; 1168 - 1169 - ret = match_string(energy_perf_strings, -1, str_preference); 1163 + ret = sysfs_match_string(energy_perf_strings, buf); 1170 1164 if (ret < 0) 1171 1165 return -EINVAL; 1172 1166 ··· 1278 1282 if (cpu_feature_enabled(X86_FEATURE_CPPC) || cppc_state == AMD_PSTATE_ACTIVE) 1279 1283 return 0; 1280 1284 1281 - for_each_present_cpu(cpu) { 1285 + for_each_online_cpu(cpu) { 1282 1286 cppc_set_auto_sel(cpu, (cppc_state == AMD_PSTATE_PASSIVE) ? 0 : 1); 1283 1287 } 1284 1288 ··· 1349 1353 return -EINVAL; 1350 1354 1351 1355 mode_idx = get_mode_idx_from_str(buf, size); 1352 - 1353 - if (mode_idx < 0 || mode_idx >= AMD_PSTATE_MAX) 1354 - return -EINVAL; 1356 + if (mode_idx < 0) 1357 + return mode_idx; 1355 1358 1356 1359 if (mode_state_machine[cppc_state][mode_idx]) { 1357 1360 guard(mutex)(&amd_pstate_driver_lock);

+8 -9

drivers/cpufreq/cppc_cpufreq.c

··· 142 142 init_irq_work(&cppc_fi->irq_work, cppc_irq_work); 143 143 144 144 ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs); 145 - if (ret) { 146 - pr_warn("%s: failed to read perf counters for cpu:%d: %d\n", 147 - __func__, cpu, ret); 148 145 149 - /* 150 - * Don't abort if the CPU was offline while the driver 151 - * was getting registered. 152 - */ 153 - if (cpu_online(cpu)) 154 - return; 146 + /* 147 + * Don't abort as the CPU was offline while the driver was 148 + * getting registered. 149 + */ 150 + if (ret && cpu_online(cpu)) { 151 + pr_debug("%s: failed to read perf counters for cpu:%d: %d\n", 152 + __func__, cpu, ret); 153 + return; 155 154 } 156 155 } 157 156

+1

drivers/cpufreq/cpufreq-dt-platdev.c

··· 87 87 { .compatible = "st-ericsson,u9540", }, 88 88 89 89 { .compatible = "starfive,jh7110", }, 90 + { .compatible = "starfive,jh7110s", }, 90 91 91 92 { .compatible = "ti,omap2", }, 92 93 { .compatible = "ti,omap4", },

+3

drivers/cpufreq/cpufreq-nforce2.c

··· 145 145 pci_read_config_dword(nforce2_sub5, NFORCE2_BOOTFSB, &fsb); 146 146 fsb /= 1000000; 147 147 148 + pci_dev_put(nforce2_sub5); 149 + 148 150 /* Check if PLL register is already set */ 149 151 pci_read_config_byte(nforce2_dev, NFORCE2_PLLENABLE, (u8 *)&temp); 150 152 ··· 428 426 static void __exit nforce2_exit(void) 429 427 { 430 428 cpufreq_unregister_driver(&nforce2_driver); 429 + pci_dev_put(nforce2_dev); 431 430 } 432 431 433 432 module_init(nforce2_init);

+7 -4

drivers/cpufreq/cpufreq.c

··· 1421 1421 * If there is a problem with its frequency table, take it 1422 1422 * offline and drop it. 1423 1423 */ 1424 - ret = cpufreq_table_validate_and_sort(policy); 1425 - if (ret) 1426 - goto out_offline_policy; 1424 + if (policy->freq_table_sorted != CPUFREQ_TABLE_SORTED_ASCENDING && 1425 + policy->freq_table_sorted != CPUFREQ_TABLE_SORTED_DESCENDING) { 1426 + ret = cpufreq_table_validate_and_sort(policy); 1427 + if (ret) 1428 + goto out_offline_policy; 1429 + } 1427 1430 1428 1431 /* related_cpus should at least include policy->cpus. */ 1429 1432 cpumask_copy(policy->related_cpus, policy->cpus); ··· 2553 2550 for_each_inactive_policy(policy) { 2554 2551 if (!strcmp(policy->last_governor, governor->name)) { 2555 2552 policy->governor = NULL; 2556 - strcpy(policy->last_governor, "\0"); 2553 + policy->last_governor[0] = '\0'; 2557 2554 } 2558 2555 } 2559 2556 read_unlock_irqrestore(&cpufreq_driver_lock, flags);

+99 -130

drivers/cpufreq/intel_pstate.c

··· 575 575 int scaling = cpu->pstate.scaling; 576 576 int freq; 577 577 578 - pr_debug("CPU%d: perf_ctl_max_phys = %d\n", cpu->cpu, perf_ctl_max_phys); 579 - pr_debug("CPU%d: perf_ctl_turbo = %d\n", cpu->cpu, perf_ctl_turbo); 580 - pr_debug("CPU%d: perf_ctl_scaling = %d\n", cpu->cpu, perf_ctl_scaling); 578 + pr_debug("CPU%d: PERF_CTL max_phys = %d\n", cpu->cpu, perf_ctl_max_phys); 579 + pr_debug("CPU%d: PERF_CTL turbo = %d\n", cpu->cpu, perf_ctl_turbo); 580 + pr_debug("CPU%d: PERF_CTL scaling = %d\n", cpu->cpu, perf_ctl_scaling); 581 581 pr_debug("CPU%d: HWP_CAP guaranteed = %d\n", cpu->cpu, cpu->pstate.max_pstate); 582 582 pr_debug("CPU%d: HWP_CAP highest = %d\n", cpu->cpu, cpu->pstate.turbo_pstate); 583 583 pr_debug("CPU%d: HWP-to-frequency scaling factor: %d\n", cpu->cpu, scaling); 584 + 585 + if (scaling == perf_ctl_scaling) 586 + return; 587 + 588 + hwp_is_hybrid = true; 584 589 585 590 cpu->pstate.turbo_freq = rounddown(cpu->pstate.turbo_pstate * scaling, 586 591 perf_ctl_scaling); ··· 914 909 [HWP_CPUFREQ_ATTR_COUNT] = NULL, 915 910 }; 916 911 912 + static u8 hybrid_get_cpu_type(unsigned int cpu) 913 + { 914 + return cpu_data(cpu).topo.intel_type; 915 + } 916 + 917 917 static bool no_cas __ro_after_init; 918 918 919 919 static struct cpudata *hybrid_max_perf_cpu __read_mostly; ··· 935 925 unsigned long *freq) 936 926 { 937 927 /* 938 - * Create "utilization bins" of 0-40%, 40%-60%, 60%-80%, and 80%-100% 939 - * of the maximum capacity such that two CPUs of the same type will be 940 - * regarded as equally attractive if the utilization of each of them 941 - * falls into the same bin, which should prevent tasks from being 942 - * migrated between them too often. 928 + * Create four "states" corresponding to 40%, 60%, 80%, and 100% of the 929 + * full capacity. 943 930 * 944 931 * For this purpose, return the "frequency" of 2 for the first 945 932 * performance level and otherwise leave the value set by the caller. ··· 950 943 return 0; 951 944 } 952 945 946 + static bool hybrid_has_l3(unsigned int cpu) 947 + { 948 + struct cpu_cacheinfo *cacheinfo = get_cpu_cacheinfo(cpu); 949 + unsigned int i; 950 + 951 + if (!cacheinfo) 952 + return false; 953 + 954 + for (i = 0; i < cacheinfo->num_leaves; i++) { 955 + if (cacheinfo->info_list[i].level == 3) 956 + return true; 957 + } 958 + 959 + return false; 960 + } 961 + 953 962 static int hybrid_get_cost(struct device *dev, unsigned long freq, 954 963 unsigned long *cost) 955 964 { 956 - struct pstate_data *pstate = &all_cpu_data[dev->id]->pstate; 957 - struct cpu_cacheinfo *cacheinfo = get_cpu_cacheinfo(dev->id); 958 - 965 + /* Facilitate load balancing between CPUs of the same type. */ 966 + *cost = freq; 959 967 /* 960 - * The smaller the perf-to-frequency scaling factor, the larger the IPC 961 - * ratio between the given CPU and the least capable CPU in the system. 962 - * Regard that IPC ratio as the primary cost component and assume that 963 - * the scaling factors for different CPU types will differ by at least 964 - * 5% and they will not be above INTEL_PSTATE_CORE_SCALING. 968 + * Adjust the cost depending on CPU type. 965 969 * 966 - * Add the freq value to the cost, so that the cost of running on CPUs 967 - * of the same type in different "utilization bins" is different. 970 + * The idea is to start loading up LPE-cores before E-cores and start 971 + * to populate E-cores when LPE-cores are utilized above 60% of the 972 + * capacity. Similarly, P-cores start to be populated when E-cores are 973 + * utilized above 60% of the capacity. 968 974 */ 969 - *cost = div_u64(100ULL * INTEL_PSTATE_CORE_SCALING, pstate->scaling) + freq; 970 - /* 971 - * Increase the cost slightly for CPUs able to access L3 to avoid 972 - * touching it in case some other CPUs of the same type can do the work 973 - * without it. 974 - */ 975 - if (cacheinfo) { 976 - unsigned int i; 977 - 978 - /* Check if L3 cache is there. */ 979 - for (i = 0; i < cacheinfo->num_leaves; i++) { 980 - if (cacheinfo->info_list[i].level == 3) { 981 - *cost += 2; 982 - break; 983 - } 984 - } 975 + if (hybrid_get_cpu_type(dev->id) == INTEL_CPU_TYPE_ATOM) { 976 + if (hybrid_has_l3(dev->id)) /* E-core */ 977 + *cost += 1; 978 + } else { /* P-core */ 979 + *cost += 2; 985 980 } 986 981 987 982 return 0; ··· 1046 1037 1047 1038 topology_set_cpu_scale(cpu->cpu, arch_scale_cpu_capacity(cpu->cpu)); 1048 1039 1049 - pr_debug("CPU%d: perf = %u, max. perf = %u, base perf = %d\n", cpu->cpu, 1050 - cpu->capacity_perf, hybrid_max_perf_cpu->capacity_perf, 1051 - cpu->pstate.max_pstate_physical); 1040 + pr_debug("CPU%d: capacity perf = %u, base perf = %u, sys max perf = %u\n", 1041 + cpu->cpu, cpu->capacity_perf, cpu->pstate.max_pstate_physical, 1042 + hybrid_max_perf_cpu->capacity_perf); 1052 1043 } 1053 1044 1054 1045 static void hybrid_clear_cpu_capacity(unsigned int cpunum) ··· 1393 1384 { 1394 1385 u64 power_ctl; 1395 1386 1396 - mutex_lock(&intel_pstate_driver_lock); 1387 + guard(mutex)(&intel_pstate_driver_lock); 1388 + 1397 1389 rdmsrq(MSR_IA32_POWER_CTL, power_ctl); 1398 1390 if (input) { 1399 1391 power_ctl &= ~BIT(MSR_IA32_POWER_CTL_BIT_EE); ··· 1404 1394 power_ctl_ee_state = POWER_CTL_EE_DISABLE; 1405 1395 } 1406 1396 wrmsrq(MSR_IA32_POWER_CTL, power_ctl); 1407 - mutex_unlock(&intel_pstate_driver_lock); 1408 1397 } 1409 1398 1410 1399 static void intel_pstate_hwp_enable(struct cpudata *cpudata); ··· 1525 1516 static ssize_t show_status(struct kobject *kobj, 1526 1517 struct kobj_attribute *attr, char *buf) 1527 1518 { 1528 - ssize_t ret; 1519 + guard(mutex)(&intel_pstate_driver_lock); 1529 1520 1530 - mutex_lock(&intel_pstate_driver_lock); 1531 - ret = intel_pstate_show_status(buf); 1532 - mutex_unlock(&intel_pstate_driver_lock); 1533 - 1534 - return ret; 1521 + return intel_pstate_show_status(buf); 1535 1522 } 1536 1523 1537 1524 static ssize_t store_status(struct kobject *a, struct kobj_attribute *b, ··· 1536 1531 char *p = memchr(buf, '\n', count); 1537 1532 int ret; 1538 1533 1539 - mutex_lock(&intel_pstate_driver_lock); 1540 - ret = intel_pstate_update_status(buf, p ? p - buf : count); 1541 - mutex_unlock(&intel_pstate_driver_lock); 1534 + guard(mutex)(&intel_pstate_driver_lock); 1542 1535 1543 - return ret < 0 ? ret : count; 1536 + ret = intel_pstate_update_status(buf, p ? p - buf : count); 1537 + if (ret < 0) 1538 + return ret; 1539 + 1540 + return count; 1544 1541 } 1545 1542 1546 1543 static ssize_t show_turbo_pct(struct kobject *kobj, ··· 1552 1545 int total, no_turbo, turbo_pct; 1553 1546 uint32_t turbo_fp; 1554 1547 1555 - mutex_lock(&intel_pstate_driver_lock); 1548 + guard(mutex)(&intel_pstate_driver_lock); 1556 1549 1557 - if (!intel_pstate_driver) { 1558 - mutex_unlock(&intel_pstate_driver_lock); 1550 + if (!intel_pstate_driver) 1559 1551 return -EAGAIN; 1560 - } 1561 1552 1562 1553 cpu = all_cpu_data[0]; 1563 1554 ··· 1563 1558 no_turbo = cpu->pstate.max_pstate - cpu->pstate.min_pstate + 1; 1564 1559 turbo_fp = div_fp(no_turbo, total); 1565 1560 turbo_pct = 100 - fp_toint(mul_fp(turbo_fp, int_tofp(100))); 1566 - 1567 - mutex_unlock(&intel_pstate_driver_lock); 1568 1561 1569 1562 return sprintf(buf, "%u\n", turbo_pct); 1570 1563 } ··· 1573 1570 struct cpudata *cpu; 1574 1571 int total; 1575 1572 1576 - mutex_lock(&intel_pstate_driver_lock); 1573 + guard(mutex)(&intel_pstate_driver_lock); 1577 1574 1578 - if (!intel_pstate_driver) { 1579 - mutex_unlock(&intel_pstate_driver_lock); 1575 + if (!intel_pstate_driver) 1580 1576 return -EAGAIN; 1581 - } 1582 1577 1583 1578 cpu = all_cpu_data[0]; 1584 1579 total = cpu->pstate.turbo_pstate - cpu->pstate.min_pstate + 1; 1585 - 1586 - mutex_unlock(&intel_pstate_driver_lock); 1587 1580 1588 1581 return sprintf(buf, "%u\n", total); 1589 1582 } ··· 1587 1588 static ssize_t show_no_turbo(struct kobject *kobj, 1588 1589 struct kobj_attribute *attr, char *buf) 1589 1590 { 1590 - ssize_t ret; 1591 + guard(mutex)(&intel_pstate_driver_lock); 1591 1592 1592 - mutex_lock(&intel_pstate_driver_lock); 1593 - 1594 - if (!intel_pstate_driver) { 1595 - mutex_unlock(&intel_pstate_driver_lock); 1593 + if (!intel_pstate_driver) 1596 1594 return -EAGAIN; 1597 - } 1598 1595 1599 - ret = sprintf(buf, "%u\n", global.no_turbo); 1600 - 1601 - mutex_unlock(&intel_pstate_driver_lock); 1602 - 1603 - return ret; 1596 + return sprintf(buf, "%u\n", global.no_turbo); 1604 1597 } 1605 1598 1606 1599 static ssize_t store_no_turbo(struct kobject *a, struct kobj_attribute *b, ··· 1604 1613 if (sscanf(buf, "%u", &input) != 1) 1605 1614 return -EINVAL; 1606 1615 1607 - mutex_lock(&intel_pstate_driver_lock); 1616 + guard(mutex)(&intel_pstate_driver_lock); 1608 1617 1609 - if (!intel_pstate_driver) { 1610 - count = -EAGAIN; 1611 - goto unlock_driver; 1612 - } 1618 + if (!intel_pstate_driver) 1619 + return -EAGAIN; 1613 1620 1614 1621 no_turbo = !!clamp_t(int, input, 0, 1); 1615 1622 1616 1623 WRITE_ONCE(global.turbo_disabled, turbo_is_disabled()); 1617 1624 if (global.turbo_disabled && !no_turbo) { 1618 1625 pr_notice("Turbo disabled by BIOS or unavailable on processor\n"); 1619 - count = -EPERM; 1620 1626 if (global.no_turbo) 1621 - goto unlock_driver; 1622 - else 1623 - no_turbo = 1; 1627 + return -EPERM; 1628 + 1629 + no_turbo = 1; 1624 1630 } 1625 1631 1626 - if (no_turbo == global.no_turbo) { 1627 - goto unlock_driver; 1628 - } 1632 + if (no_turbo == global.no_turbo) 1633 + return count; 1629 1634 1630 1635 WRITE_ONCE(global.no_turbo, no_turbo); 1631 1636 ··· 1640 1653 1641 1654 intel_pstate_update_limits_for_all(); 1642 1655 arch_set_max_freq_ratio(no_turbo); 1643 - 1644 - unlock_driver: 1645 - mutex_unlock(&intel_pstate_driver_lock); 1646 1656 1647 1657 return count; 1648 1658 } ··· 1690 1706 if (ret != 1) 1691 1707 return -EINVAL; 1692 1708 1693 - mutex_lock(&intel_pstate_driver_lock); 1709 + guard(mutex)(&intel_pstate_driver_lock); 1694 1710 1695 - if (!intel_pstate_driver) { 1696 - mutex_unlock(&intel_pstate_driver_lock); 1711 + if (!intel_pstate_driver) 1697 1712 return -EAGAIN; 1698 - } 1699 1713 1700 1714 mutex_lock(&intel_pstate_limits_lock); 1701 1715 ··· 1705 1723 intel_pstate_update_policies(); 1706 1724 else 1707 1725 update_qos_requests(FREQ_QOS_MAX); 1708 - 1709 - mutex_unlock(&intel_pstate_driver_lock); 1710 1726 1711 1727 return count; 1712 1728 } ··· 1719 1739 if (ret != 1) 1720 1740 return -EINVAL; 1721 1741 1722 - mutex_lock(&intel_pstate_driver_lock); 1742 + guard(mutex)(&intel_pstate_driver_lock); 1723 1743 1724 - if (!intel_pstate_driver) { 1725 - mutex_unlock(&intel_pstate_driver_lock); 1744 + if (!intel_pstate_driver) 1726 1745 return -EAGAIN; 1727 - } 1728 1746 1729 1747 mutex_lock(&intel_pstate_limits_lock); 1730 1748 ··· 1735 1757 intel_pstate_update_policies(); 1736 1758 else 1737 1759 update_qos_requests(FREQ_QOS_MIN); 1738 - 1739 - mutex_unlock(&intel_pstate_driver_lock); 1740 1760 1741 1761 return count; 1742 1762 } ··· 1756 1780 if (ret) 1757 1781 return ret; 1758 1782 1759 - mutex_lock(&intel_pstate_driver_lock); 1783 + guard(mutex)(&intel_pstate_driver_lock); 1784 + 1760 1785 hwp_boost = !!input; 1761 1786 intel_pstate_update_policies(); 1762 - mutex_unlock(&intel_pstate_driver_lock); 1763 1787 1764 1788 return count; 1765 1789 } ··· 2048 2072 intel_pstate_update_epp_defaults(cpudata); 2049 2073 } 2050 2074 2075 + static u64 get_perf_ctl_val(int pstate) 2076 + { 2077 + u64 val; 2078 + 2079 + val = (u64)pstate << 8; 2080 + if (READ_ONCE(global.no_turbo) && !READ_ONCE(global.turbo_disabled) && 2081 + cpu_feature_enabled(X86_FEATURE_IDA)) 2082 + val |= (u64)1 << 32; 2083 + 2084 + return val; 2085 + } 2086 + 2051 2087 static int atom_get_min_pstate(int not_used) 2052 2088 { 2053 2089 u64 value; ··· 2086 2098 2087 2099 static u64 atom_get_val(struct cpudata *cpudata, int pstate) 2088 2100 { 2089 - u64 val; 2101 + u64 val = get_perf_ctl_val(pstate); 2090 2102 int32_t vid_fp; 2091 2103 u32 vid; 2092 - 2093 - val = (u64)pstate << 8; 2094 - if (READ_ONCE(global.no_turbo) && !READ_ONCE(global.turbo_disabled) && 2095 - cpu_feature_enabled(X86_FEATURE_IDA)) 2096 - val |= (u64)1 << 32; 2097 2104 2098 2105 vid_fp = cpudata->vid.min + mul_fp( 2099 2106 int_tofp(pstate - cpudata->pstate.min_pstate), ··· 2249 2266 2250 2267 static u64 core_get_val(struct cpudata *cpudata, int pstate) 2251 2268 { 2252 - u64 val; 2253 - 2254 - val = (u64)pstate << 8; 2255 - if (READ_ONCE(global.no_turbo) && !READ_ONCE(global.turbo_disabled) && 2256 - cpu_feature_enabled(X86_FEATURE_IDA)) 2257 - val |= (u64)1 << 32; 2258 - 2259 - return val; 2269 + return get_perf_ctl_val(pstate); 2260 2270 } 2261 2271 2262 2272 static int knl_get_aperf_mperf_shift(void) ··· 2273 2297 static int hwp_get_cpu_scaling(int cpu) 2274 2298 { 2275 2299 if (hybrid_scaling_factor) { 2276 - struct cpuinfo_x86 *c = &cpu_data(cpu); 2277 - u8 cpu_type = c->topo.intel_type; 2278 - 2279 2300 /* 2280 2301 * Return the hybrid scaling factor for P-cores and use the 2281 2302 * default core scaling for E-cores. 2282 2303 */ 2283 - if (cpu_type == INTEL_CPU_TYPE_CORE) 2304 + if (hybrid_get_cpu_type(cpu) == INTEL_CPU_TYPE_CORE) 2284 2305 return hybrid_scaling_factor; 2285 2306 2286 - if (cpu_type == INTEL_CPU_TYPE_ATOM) 2287 - return core_get_scaling(); 2307 + return core_get_scaling(); 2288 2308 } 2289 2309 2290 2310 /* Use core scaling on non-hybrid systems. */ ··· 2315 2343 2316 2344 static void intel_pstate_get_cpu_pstates(struct cpudata *cpu) 2317 2345 { 2318 - int perf_ctl_max_phys = pstate_funcs.get_max_physical(cpu->cpu); 2319 2346 int perf_ctl_scaling = pstate_funcs.get_scaling(); 2320 2347 2348 + cpu->pstate.max_pstate_physical = pstate_funcs.get_max_physical(cpu->cpu); 2321 2349 cpu->pstate.min_pstate = pstate_funcs.get_min(cpu->cpu); 2322 - cpu->pstate.max_pstate_physical = perf_ctl_max_phys; 2323 2350 cpu->pstate.perf_ctl_scaling = perf_ctl_scaling; 2324 2351 2325 2352 if (hwp_active && !hwp_mode_bdw) { ··· 2326 2355 2327 2356 if (pstate_funcs.get_cpu_scaling) { 2328 2357 cpu->pstate.scaling = pstate_funcs.get_cpu_scaling(cpu->cpu); 2329 - if (cpu->pstate.scaling != perf_ctl_scaling) { 2330 - intel_pstate_hybrid_hwp_adjust(cpu); 2331 - hwp_is_hybrid = true; 2332 - } 2358 + intel_pstate_hybrid_hwp_adjust(cpu); 2333 2359 } else { 2334 2360 cpu->pstate.scaling = perf_ctl_scaling; 2335 2361 } ··· 2728 2760 X86_MATCH(INTEL_ATOM_CRESTMONT, core_funcs), 2729 2761 X86_MATCH(INTEL_ATOM_CRESTMONT_X, core_funcs), 2730 2762 X86_MATCH(INTEL_ATOM_DARKMONT_X, core_funcs), 2763 + X86_MATCH(INTEL_DIAMONDRAPIDS_X, core_funcs), 2731 2764 {} 2732 2765 }; 2733 2766 #endif ··· 3881 3912 3882 3913 } 3883 3914 3884 - mutex_lock(&intel_pstate_driver_lock); 3885 - rc = intel_pstate_register_driver(default_driver); 3886 - mutex_unlock(&intel_pstate_driver_lock); 3915 + scoped_guard(mutex, &intel_pstate_driver_lock) { 3916 + rc = intel_pstate_register_driver(default_driver); 3917 + } 3887 3918 if (rc) { 3888 3919 intel_pstate_sysfs_remove(); 3889 3920 return rc;

+33 -2

drivers/cpufreq/qcom-cpufreq-nvmem.c

··· 256 256 return ret; 257 257 } 258 258 259 + static const struct of_device_id qcom_cpufreq_ipq806x_match_list[] __maybe_unused = { 260 + { .compatible = "qcom,ipq8062", .data = (const void *)QCOM_ID_IPQ8062 }, 261 + { .compatible = "qcom,ipq8064", .data = (const void *)QCOM_ID_IPQ8064 }, 262 + { .compatible = "qcom,ipq8065", .data = (const void *)QCOM_ID_IPQ8065 }, 263 + { .compatible = "qcom,ipq8066", .data = (const void *)QCOM_ID_IPQ8066 }, 264 + { .compatible = "qcom,ipq8068", .data = (const void *)QCOM_ID_IPQ8068 }, 265 + { .compatible = "qcom,ipq8069", .data = (const void *)QCOM_ID_IPQ8069 }, 266 + }; 267 + 259 268 static int qcom_cpufreq_ipq8064_name_version(struct device *cpu_dev, 260 269 struct nvmem_cell *speedbin_nvmem, 261 270 char **pvs_name, 262 271 struct qcom_cpufreq_drv *drv) 263 272 { 273 + int msm_id = -1, ret = 0; 264 274 int speed = 0, pvs = 0; 265 - int msm_id, ret = 0; 266 275 u8 *speedbin; 267 276 size_t len; 268 277 ··· 288 279 get_krait_bin_format_a(cpu_dev, &speed, &pvs, speedbin); 289 280 290 281 ret = qcom_smem_get_soc_id(&msm_id); 291 - if (ret) 282 + if (ret == -ENODEV) { 283 + const struct of_device_id *match; 284 + struct device_node *root; 285 + 286 + root = of_find_node_by_path("/"); 287 + if (!root) { 288 + ret = -ENODEV; 289 + goto exit; 290 + } 291 + 292 + /* Fallback to compatible match with no SMEM initialized */ 293 + match = of_match_node(qcom_cpufreq_ipq806x_match_list, root); 294 + of_node_put(root); 295 + if (!match) { 296 + ret = -ENODEV; 297 + goto exit; 298 + } 299 + 300 + /* We found a matching device, get the msm_id from the data entry */ 301 + msm_id = (int)(uintptr_t)match->data; 302 + ret = 0; 303 + } else if (ret) { 292 304 goto exit; 305 + } 293 306 294 307 switch (msm_id) { 295 308 case QCOM_ID_IPQ8062:

+4 -2

drivers/cpufreq/s5pv210-cpufreq.c

··· 518 518 519 519 if (policy->cpu != 0) { 520 520 ret = -EINVAL; 521 - goto out_dmc1; 521 + goto out; 522 522 } 523 523 524 524 /* ··· 530 530 if ((mem_type != LPDDR) && (mem_type != LPDDR2)) { 531 531 pr_err("CPUFreq doesn't support this memory type\n"); 532 532 ret = -EINVAL; 533 - goto out_dmc1; 533 + goto out; 534 534 } 535 535 536 536 /* Find current refresh counter and frequency each DMC */ ··· 544 544 cpufreq_generic_init(policy, s5pv210_freq_table, 40000); 545 545 return 0; 546 546 547 + out: 548 + clk_put(dmc1_clk); 547 549 out_dmc1: 548 550 clk_put(dmc0_clk); 549 551 out_dmc0:

+143 -7

drivers/cpufreq/tegra186-cpufreq.c

··· 8 8 #include <linux/module.h> 9 9 #include <linux/of.h> 10 10 #include <linux/platform_device.h> 11 + #include <linux/units.h> 11 12 12 13 #include <soc/tegra/bpmp.h> 13 14 #include <soc/tegra/bpmp-abi.h> ··· 59 58 }; 60 59 61 60 struct tegra186_cpufreq_cluster { 62 - struct cpufreq_frequency_table *table; 61 + struct cpufreq_frequency_table *bpmp_lut; 63 62 u32 ref_clk_khz; 64 63 u32 div; 65 64 }; ··· 67 66 struct tegra186_cpufreq_data { 68 67 void __iomem *regs; 69 68 const struct tegra186_cpufreq_cpu *cpus; 69 + bool icc_dram_bw_scaling; 70 70 struct tegra186_cpufreq_cluster clusters[]; 71 71 }; 72 + 73 + static int tegra_cpufreq_set_bw(struct cpufreq_policy *policy, unsigned long freq_khz) 74 + { 75 + struct tegra186_cpufreq_data *data = cpufreq_get_driver_data(); 76 + struct device *dev; 77 + int ret; 78 + 79 + dev = get_cpu_device(policy->cpu); 80 + if (!dev) 81 + return -ENODEV; 82 + 83 + struct dev_pm_opp *opp __free(put_opp) = 84 + dev_pm_opp_find_freq_exact(dev, freq_khz * HZ_PER_KHZ, true); 85 + if (IS_ERR(opp)) 86 + return PTR_ERR(opp); 87 + 88 + ret = dev_pm_opp_set_opp(dev, opp); 89 + if (ret) 90 + data->icc_dram_bw_scaling = false; 91 + 92 + return ret; 93 + } 94 + 95 + static int tegra_cpufreq_init_cpufreq_table(struct cpufreq_policy *policy, 96 + struct cpufreq_frequency_table *bpmp_lut, 97 + struct cpufreq_frequency_table **opp_table) 98 + { 99 + struct tegra186_cpufreq_data *data = cpufreq_get_driver_data(); 100 + struct cpufreq_frequency_table *freq_table = NULL; 101 + struct cpufreq_frequency_table *pos; 102 + struct device *cpu_dev; 103 + unsigned long rate; 104 + int ret, max_opps; 105 + int j = 0; 106 + 107 + cpu_dev = get_cpu_device(policy->cpu); 108 + if (!cpu_dev) { 109 + pr_err("%s: failed to get cpu%d device\n", __func__, policy->cpu); 110 + return -ENODEV; 111 + } 112 + 113 + /* Initialize OPP table mentioned in operating-points-v2 property in DT */ 114 + ret = dev_pm_opp_of_add_table_indexed(cpu_dev, 0); 115 + if (ret) { 116 + dev_err(cpu_dev, "Invalid or empty opp table in device tree\n"); 117 + data->icc_dram_bw_scaling = false; 118 + return ret; 119 + } 120 + 121 + max_opps = dev_pm_opp_get_opp_count(cpu_dev); 122 + if (max_opps <= 0) { 123 + dev_err(cpu_dev, "Failed to add OPPs\n"); 124 + return max_opps; 125 + } 126 + 127 + /* Disable all opps and cross-validate against LUT later */ 128 + for (rate = 0; ; rate++) { 129 + struct dev_pm_opp *opp __free(put_opp) = 130 + dev_pm_opp_find_freq_ceil(cpu_dev, &rate); 131 + if (IS_ERR(opp)) 132 + break; 133 + 134 + dev_pm_opp_disable(cpu_dev, rate); 135 + } 136 + 137 + freq_table = kcalloc((max_opps + 1), sizeof(*freq_table), GFP_KERNEL); 138 + if (!freq_table) 139 + return -ENOMEM; 140 + 141 + /* 142 + * Cross check the frequencies from BPMP-FW LUT against the OPP's present in DT. 143 + * Enable only those DT OPP's which are present in LUT also. 144 + */ 145 + cpufreq_for_each_valid_entry(pos, bpmp_lut) { 146 + struct dev_pm_opp *opp __free(put_opp) = 147 + dev_pm_opp_find_freq_exact(cpu_dev, pos->frequency * HZ_PER_KHZ, false); 148 + if (IS_ERR(opp)) 149 + continue; 150 + 151 + ret = dev_pm_opp_enable(cpu_dev, pos->frequency * HZ_PER_KHZ); 152 + if (ret < 0) 153 + return ret; 154 + 155 + freq_table[j].driver_data = pos->driver_data; 156 + freq_table[j].frequency = pos->frequency; 157 + j++; 158 + } 159 + 160 + freq_table[j].driver_data = pos->driver_data; 161 + freq_table[j].frequency = CPUFREQ_TABLE_END; 162 + 163 + *opp_table = &freq_table[0]; 164 + 165 + dev_pm_opp_set_sharing_cpus(cpu_dev, policy->cpus); 166 + 167 + /* Prime interconnect data */ 168 + tegra_cpufreq_set_bw(policy, freq_table[j - 1].frequency); 169 + 170 + return ret; 171 + } 72 172 73 173 static int tegra186_cpufreq_init(struct cpufreq_policy *policy) 74 174 { 75 175 struct tegra186_cpufreq_data *data = cpufreq_get_driver_data(); 76 176 unsigned int cluster = data->cpus[policy->cpu].bpmp_cluster_id; 177 + struct cpufreq_frequency_table *freq_table; 178 + struct cpufreq_frequency_table *bpmp_lut; 77 179 u32 cpu; 180 + int ret; 78 181 79 - policy->freq_table = data->clusters[cluster].table; 80 182 policy->cpuinfo.transition_latency = 300 * 1000; 81 183 policy->driver_data = NULL; 82 184 ··· 188 84 if (data->cpus[cpu].bpmp_cluster_id == cluster) 189 85 cpumask_set_cpu(cpu, policy->cpus); 190 86 } 87 + 88 + bpmp_lut = data->clusters[cluster].bpmp_lut; 89 + 90 + if (data->icc_dram_bw_scaling) { 91 + ret = tegra_cpufreq_init_cpufreq_table(policy, bpmp_lut, &freq_table); 92 + if (!ret) { 93 + policy->freq_table = freq_table; 94 + return 0; 95 + } 96 + } 97 + 98 + data->icc_dram_bw_scaling = false; 99 + policy->freq_table = bpmp_lut; 100 + pr_info("OPP tables missing from DT, EMC frequency scaling disabled\n"); 191 101 192 102 return 0; 193 103 } ··· 219 101 edvd_offset = data->cpus[cpu].edvd_offset; 220 102 writel(edvd_val, data->regs + edvd_offset); 221 103 } 104 + 105 + if (data->icc_dram_bw_scaling) 106 + tegra_cpufreq_set_bw(policy, tbl->frequency); 107 + 222 108 223 109 return 0; 224 110 } ··· 256 134 .init = tegra186_cpufreq_init, 257 135 }; 258 136 259 - static struct cpufreq_frequency_table *init_vhint_table( 137 + static struct cpufreq_frequency_table *tegra_cpufreq_bpmp_read_lut( 260 138 struct platform_device *pdev, struct tegra_bpmp *bpmp, 261 139 struct tegra186_cpufreq_cluster *cluster, unsigned int cluster_id, 262 140 int *num_rates) ··· 351 229 { 352 230 struct tegra186_cpufreq_data *data; 353 231 struct tegra_bpmp *bpmp; 232 + struct device *cpu_dev; 354 233 unsigned int i = 0, err, edvd_offset; 355 234 int num_rates = 0; 356 235 u32 edvd_val, cpu; ··· 377 254 for (i = 0; i < TEGRA186_NUM_CLUSTERS; i++) { 378 255 struct tegra186_cpufreq_cluster *cluster = &data->clusters[i]; 379 256 380 - cluster->table = init_vhint_table(pdev, bpmp, cluster, i, &num_rates); 381 - if (IS_ERR(cluster->table)) { 382 - err = PTR_ERR(cluster->table); 257 + cluster->bpmp_lut = tegra_cpufreq_bpmp_read_lut(pdev, bpmp, cluster, i, &num_rates); 258 + if (IS_ERR(cluster->bpmp_lut)) { 259 + err = PTR_ERR(cluster->bpmp_lut); 383 260 goto put_bpmp; 384 261 } else if (!num_rates) { 385 262 err = -EINVAL; ··· 388 265 389 266 for (cpu = 0; cpu < ARRAY_SIZE(tegra186_cpus); cpu++) { 390 267 if (data->cpus[cpu].bpmp_cluster_id == i) { 391 - edvd_val = cluster->table[num_rates - 1].driver_data; 268 + edvd_val = cluster->bpmp_lut[num_rates - 1].driver_data; 392 269 edvd_offset = data->cpus[cpu].edvd_offset; 393 270 writel(edvd_val, data->regs + edvd_offset); 394 271 } ··· 396 273 } 397 274 398 275 tegra186_cpufreq_driver.driver_data = data; 276 + 277 + /* Check for optional OPPv2 and interconnect paths on CPU0 to enable ICC scaling */ 278 + cpu_dev = get_cpu_device(0); 279 + if (!cpu_dev) { 280 + err = -EPROBE_DEFER; 281 + goto put_bpmp; 282 + } 283 + 284 + if (dev_pm_opp_of_get_opp_desc_node(cpu_dev)) { 285 + err = dev_pm_opp_of_find_icc_paths(cpu_dev, NULL); 286 + if (!err) 287 + data->icc_dram_bw_scaling = true; 288 + } 399 289 400 290 err = cpufreq_register_driver(&tegra186_cpufreq_driver); 401 291

+2 -1

drivers/cpufreq/tegra194-cpufreq.c

··· 750 750 if (IS_ERR(bpmp)) 751 751 return PTR_ERR(bpmp); 752 752 753 - read_counters_wq = alloc_workqueue("read_counters_wq", __WQ_LEGACY, 1); 753 + read_counters_wq = alloc_workqueue("read_counters_wq", 754 + __WQ_LEGACY | WQ_PERCPU, 1); 754 755 if (!read_counters_wq) { 755 756 dev_err(&pdev->dev, "fail to create_workqueue\n"); 756 757 err = -EINVAL;

+7 -5

drivers/cpuidle/cpuidle.c

··· 184 184 * cpuidle_enter_s2idle - Enter an idle state suitable for suspend-to-idle. 185 185 * @drv: cpuidle driver for the given CPU. 186 186 * @dev: cpuidle device for the given CPU. 187 + * @latency_limit_ns: Idle state exit latency limit 187 188 * 188 189 * If there are states with the ->enter_s2idle callback, find the deepest of 189 190 * them and enter it with frozen tick. 190 191 */ 191 - int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev) 192 + int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev, 193 + u64 latency_limit_ns) 192 194 { 193 195 int index; 194 196 195 197 /* 196 - * Find the deepest state with ->enter_s2idle present, which guarantees 197 - * that interrupts won't be enabled when it exits and allows the tick to 198 - * be frozen safely. 198 + * Find the deepest state with ->enter_s2idle present that meets the 199 + * specified latency limit, which guarantees that interrupts won't be 200 + * enabled when it exits and allows the tick to be frozen safely. 199 201 */ 200 - index = find_deepest_state(drv, dev, U64_MAX, 0, true); 202 + index = find_deepest_state(drv, dev, latency_limit_ns, 0, true); 201 203 if (index > 0) { 202 204 enter_s2idle_proper(drv, dev, index); 203 205 local_irq_enable();

+10

drivers/cpuidle/driver.c

··· 8 8 * This code is licenced under the GPL. 9 9 */ 10 10 11 + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 12 + 11 13 #include <linux/mutex.h> 12 14 #include <linux/module.h> 13 15 #include <linux/sched.h> ··· 195 193 s->exit_latency_ns = 0; 196 194 else 197 195 s->exit_latency = div_u64(s->exit_latency_ns, NSEC_PER_USEC); 196 + 197 + /* 198 + * Warn if the exit latency of a CPU idle state exceeds its 199 + * target residency which is assumed to never happen in cpuidle 200 + * in multiple places. 201 + */ 202 + if (s->exit_latency_ns > s->target_residency_ns) 203 + pr_warn("Idle state %d target residency too low\n", i); 198 204 } 199 205 } 200 206

+4

drivers/cpuidle/governor.c

··· 111 111 struct device *device = get_cpu_device(cpu); 112 112 int device_req = dev_pm_qos_raw_resume_latency(device); 113 113 int global_req = cpu_latency_qos_limit(); 114 + int global_wake_req = cpu_wakeup_latency_qos_limit(); 115 + 116 + if (global_req > global_wake_req) 117 + global_req = global_wake_req; 114 118 115 119 if (device_req > global_req) 116 120 device_req = global_req;

+5 -4

drivers/cpuidle/governors/menu.c

···

+72 -87

drivers/cpuidle/governors/teo.c

··· 76 76 * likely woken up by a non-timer wakeup source). 77 77 * 78 78 * 2. If the second sum computed in step 1 is greater than a half of the sum of 79 - * both metrics for the candidate state bin and all subsequent bins(if any), 79 + * both metrics for the candidate state bin and all subsequent bins (if any), 80 80 * a shallower idle state is likely to be more suitable, so look for it. 81 81 * 82 82 * - Traverse the enabled idle states shallower than the candidate one in the ··· 133 133 * @sleep_length_ns: Time till the closest timer event (at the selection time). 134 134 * @state_bins: Idle state data bins for this CPU. 135 135 * @total: Grand total of the "intercepts" and "hits" metrics for all bins. 136 + * @total_tick: Wakeups by the scheduler tick. 136 137 * @tick_intercepts: "Intercepts" before TICK_NSEC. 137 138 * @short_idles: Wakeups after short idle periods. 138 - * @artificial_wakeup: Set if the wakeup has been triggered by a safety net. 139 + * @tick_wakeup: Set if the last wakeup was by the scheduler tick. 139 140 */ 140 141 struct teo_cpu { 141 142 s64 sleep_length_ns; 142 143 struct teo_bin state_bins[CPUIDLE_STATE_MAX]; 143 144 unsigned int total; 145 + unsigned int total_tick; 144 146 unsigned int tick_intercepts; 145 147 unsigned int short_idles; 146 - bool artificial_wakeup; 148 + bool tick_wakeup; 147 149 }; 148 150 149 151 static DEFINE_PER_CPU(struct teo_cpu, teo_cpus); 152 + 153 + static void teo_decay(unsigned int *metric) 154 + { 155 + unsigned int delta = *metric >> DECAY_SHIFT; 156 + 157 + if (delta) 158 + *metric -= delta; 159 + else 160 + *metric = 0; 161 + } 150 162 151 163 /** 152 164 * teo_update - Update CPU metrics after wakeup. ··· 167 155 */ 168 156 static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) 169 157 { 170 - struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); 158 + struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus); 171 159 int i, idx_timer = 0, idx_duration = 0; 172 - s64 target_residency_ns; 173 - u64 measured_ns; 160 + s64 target_residency_ns, measured_ns; 161 + unsigned int total = 0; 174 162 175 - cpu_data->short_idles -= cpu_data->short_idles >> DECAY_SHIFT; 163 + teo_decay(&cpu_data->short_idles); 176 164 177 - if (cpu_data->artificial_wakeup) { 165 + if (dev->poll_time_limit) { 166 + dev->poll_time_limit = false; 178 167 /* 179 - * If one of the safety nets has triggered, assume that this 168 + * Polling state timeout has triggered, so assume that this 180 169 * might have been a long sleep. 181 170 */ 182 - measured_ns = U64_MAX; 171 + measured_ns = S64_MAX; 183 172 } else { 184 - u64 lat_ns = drv->states[dev->last_state_idx].exit_latency_ns; 173 + s64 lat_ns = drv->states[dev->last_state_idx].exit_latency_ns; 185 174 186 175 measured_ns = dev->last_residency_ns; 187 176 /* ··· 209 196 for (i = 0; i < drv->state_count; i++) { 210 197 struct teo_bin *bin = &cpu_data->state_bins[i]; 211 198 212 - bin->hits -= bin->hits >> DECAY_SHIFT; 213 - bin->intercepts -= bin->intercepts >> DECAY_SHIFT; 199 + teo_decay(&bin->hits); 200 + total += bin->hits; 201 + teo_decay(&bin->intercepts); 202 + total += bin->intercepts; 214 203 215 204 target_residency_ns = drv->states[i].target_residency_ns; 216 205 ··· 223 208 } 224 209 } 225 210 226 - cpu_data->tick_intercepts -= cpu_data->tick_intercepts >> DECAY_SHIFT; 211 + cpu_data->total = total + PULSE; 212 + 213 + teo_decay(&cpu_data->tick_intercepts); 214 + 215 + teo_decay(&cpu_data->total_tick); 216 + if (cpu_data->tick_wakeup) { 217 + cpu_data->total_tick += PULSE; 218 + /* 219 + * If tick wakeups dominate the wakeup pattern, count this one 220 + * as a hit on the deepest available idle state to increase the 221 + * likelihood of stopping the tick. 222 + */ 223 + if (3 * cpu_data->total_tick > 2 * cpu_data->total) { 224 + cpu_data->state_bins[drv->state_count-1].hits += PULSE; 225 + return; 226 + } 227 + } 228 + 227 229 /* 228 230 * If the measured idle duration falls into the same bin as the sleep 229 231 * length, this is a "hit", so update the "hits" metric for that bin. ··· 251 219 cpu_data->state_bins[idx_timer].hits += PULSE; 252 220 } else { 253 221 cpu_data->state_bins[idx_duration].intercepts += PULSE; 254 - if (TICK_NSEC <= measured_ns) 222 + if (measured_ns <= TICK_NSEC) 255 223 cpu_data->tick_intercepts += PULSE; 256 224 } 257 - 258 - cpu_data->total -= cpu_data->total >> DECAY_SHIFT; 259 - cpu_data->total += PULSE; 260 - } 261 - 262 - static bool teo_state_ok(int i, struct cpuidle_driver *drv) 263 - { 264 - return !tick_nohz_tick_stopped() || 265 - drv->states[i].target_residency_ns >= TICK_NSEC; 266 225 } 267 226 268 227 /** ··· 262 239 * @dev: Target CPU. 263 240 * @state_idx: Index of the capping idle state. 264 241 * @duration_ns: Idle duration value to match. 265 - * @no_poll: Don't consider polling states. 266 242 */ 267 243 static int teo_find_shallower_state(struct cpuidle_driver *drv, 268 244 struct cpuidle_device *dev, int state_idx, 269 - s64 duration_ns, bool no_poll) 245 + s64 duration_ns) 270 246 { 271 247 int i; 272 248 273 249 for (i = state_idx - 1; i >= 0; i--) { 274 - if (dev->states_usage[i].disable || 275 - (no_poll && drv->states[i].flags & CPUIDLE_FLAG_POLLING)) 250 + if (dev->states_usage[i].disable) 276 251 continue; 277 252 278 253 state_idx = i; ··· 289 268 static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, 290 269 bool *stop_tick) 291 270 { 292 - struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); 271 + struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus); 293 272 s64 latency_req = cpuidle_governor_latency_req(dev->cpu); 294 273 ktime_t delta_tick = TICK_NSEC / 2; 295 274 unsigned int idx_intercept_sum = 0; ··· 377 356 * better choice. 378 357 */ 379 358 if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) { 380 - int first_suitable_idx = idx; 359 + int min_idx = idx0; 360 + 361 + if (tick_nohz_tick_stopped()) { 362 + /* 363 + * Look for the shallowest idle state below the current 364 + * candidate one whose target residency is at least 365 + * equal to the tick period length. 366 + */ 367 + while (min_idx < idx && 368 + drv->states[min_idx].target_residency_ns < TICK_NSEC) 369 + min_idx++; 370 + } 381 371 382 372 /* 383 373 * Look for the deepest idle state whose target residency had ··· 398 366 * Take the possible duration limitation present if the tick 399 367 * has been stopped already into account. 400 368 */ 401 - intercept_sum = 0; 402 - 403 - for (i = idx - 1; i >= 0; i--) { 404 - struct teo_bin *bin = &cpu_data->state_bins[i]; 405 - 406 - intercept_sum += bin->intercepts; 407 - 408 - if (2 * intercept_sum > idx_intercept_sum) { 409 - /* 410 - * Use the current state unless it is too 411 - * shallow or disabled, in which case take the 412 - * first enabled state that is deep enough. 413 - */ 414 - if (teo_state_ok(i, drv) && 415 - !dev->states_usage[i].disable) { 416 - idx = i; 417 - break; 418 - } 419 - idx = first_suitable_idx; 420 - break; 421 - } 369 + for (i = idx - 1, intercept_sum = 0; i >= min_idx; i--) { 370 + intercept_sum += cpu_data->state_bins[i].intercepts; 422 371 423 372 if (dev->states_usage[i].disable) 424 373 continue; 425 374 426 - if (teo_state_ok(i, drv)) { 427 - /* 428 - * The current state is deep enough, but still 429 - * there may be a better one. 430 - */ 431 - first_suitable_idx = i; 432 - continue; 433 - } 434 - 435 - /* 436 - * The current state is too shallow, so if no suitable 437 - * states other than the initial candidate have been 438 - * found, give up (the remaining states to check are 439 - * shallower still), but otherwise the first suitable 440 - * state other than the initial candidate may turn out 441 - * to be preferable. 442 - */ 443 - if (first_suitable_idx == idx) 375 + idx = i; 376 + if (2 * intercept_sum > idx_intercept_sum) 444 377 break; 445 378 } 446 379 } ··· 455 458 * If the closest expected timer is before the target residency of the 456 459 * candidate state, a shallower one needs to be found. 457 460 */ 458 - if (drv->states[idx].target_residency_ns > duration_ns) { 459 - i = teo_find_shallower_state(drv, dev, idx, duration_ns, false); 460 - if (teo_state_ok(i, drv)) 461 - idx = i; 462 - } 461 + if (drv->states[idx].target_residency_ns > duration_ns) 462 + idx = teo_find_shallower_state(drv, dev, idx, duration_ns); 463 463 464 464 /* 465 465 * If the selected state's target residency is below the tick length ··· 484 490 */ 485 491 if (idx > idx0 && 486 492 drv->states[idx].target_residency_ns > delta_tick) 487 - idx = teo_find_shallower_state(drv, dev, idx, delta_tick, false); 493 + idx = teo_find_shallower_state(drv, dev, idx, delta_tick); 488 494 489 495 out_tick: 490 496 *stop_tick = false; ··· 498 504 */ 499 505 static void teo_reflect(struct cpuidle_device *dev, int state) 500 506 { 501 - struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); 507 + struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus); 508 + 509 + cpu_data->tick_wakeup = tick_nohz_idle_got_tick(); 502 510 503 511 dev->last_state_idx = state; 504 - if (dev->poll_time_limit || 505 - (tick_nohz_idle_got_tick() && cpu_data->sleep_length_ns > TICK_NSEC)) { 506 - /* 507 - * The wakeup was not "genuine", but triggered by one of the 508 - * safety nets. 509 - */ 510 - dev->poll_time_limit = false; 511 - cpu_data->artificial_wakeup = true; 512 - } else { 513 - cpu_data->artificial_wakeup = false; 514 - } 515 512 } 516 513 517 514 /**

+4

drivers/cpuidle/poll_state.c

··· 4 4 */ 5 5 6 6 #include <linux/cpuidle.h> 7 + #include <linux/export.h> 8 + #include <linux/irqflags.h> 7 9 #include <linux/sched.h> 8 10 #include <linux/sched/clock.h> 9 11 #include <linux/sched/idle.h> 12 + #include <linux/sprintf.h> 13 + #include <linux/types.h> 10 14 11 15 #define POLL_IDLE_RELAX_COUNT 200 12 16

+1 -1

drivers/devfreq/devfreq.c

··· 20 20 #include <linux/stat.h> 21 21 #include <linux/pm_opp.h> 22 22 #include <linux/devfreq.h> 23 + #include <linux/devfreq-governor.h> 23 24 #include <linux/workqueue.h> 24 25 #include <linux/platform_device.h> 25 26 #include <linux/list.h> ··· 29 28 #include <linux/of.h> 30 29 #include <linux/pm_qos.h> 31 30 #include <linux/units.h> 32 - #include "governor.h" 33 31 34 32 #define CREATE_TRACE_POINTS 35 33 #include <trace/events/devfreq.h>

+4 -29

drivers/devfreq/governor.h include/linux/devfreq-governor.h

··· 5 5 * Copyright (C) 2011 Samsung Electronics 6 6 * MyungJoo Ham <myungjoo.ham@samsung.com> 7 7 * 8 - * This header is for devfreq governors in drivers/devfreq/ 8 + * This header is for devfreq governors 9 9 */ 10 10 11 - #ifndef _GOVERNOR_H 12 - #define _GOVERNOR_H 11 + #ifndef __LINUX_DEVFREQ_DEVFREQ_H__ 12 + #define __LINUX_DEVFREQ_DEVFREQ_H__ 13 13 14 14 #include <linux/devfreq.h> 15 15 ··· 46 46 */ 47 47 #define DEVFREQ_GOV_ATTR_POLLING_INTERVAL BIT(0) 48 48 #define DEVFREQ_GOV_ATTR_TIMER BIT(1) 49 - 50 - /** 51 - * struct devfreq_cpu_data - Hold the per-cpu data 52 - * @node: list node 53 - * @dev: reference to cpu device. 54 - * @first_cpu: the cpumask of the first cpu of a policy. 55 - * @opp_table: reference to cpu opp table. 56 - * @cur_freq: the current frequency of the cpu. 57 - * @min_freq: the min frequency of the cpu. 58 - * @max_freq: the max frequency of the cpu. 59 - * 60 - * This structure stores the required cpu_data of a cpu. 61 - * This is auto-populated by the governor. 62 - */ 63 - struct devfreq_cpu_data { 64 - struct list_head node; 65 - 66 - struct device *dev; 67 - unsigned int first_cpu; 68 - 69 - struct opp_table *opp_table; 70 - unsigned int cur_freq; 71 - unsigned int min_freq; 72 - unsigned int max_freq; 73 - }; 74 49 75 50 /** 76 51 * struct devfreq_governor - Devfreq policy governor ··· 99 124 100 125 return df->profile->get_dev_status(df->dev.parent, &df->last_status); 101 126 } 102 - #endif /* _GOVERNOR_H */ 127 + #endif /* __LINUX_DEVFREQ_DEVFREQ_H__ */

+26 -1

drivers/devfreq/governor_passive.c

··· 14 14 #include <linux/slab.h> 15 15 #include <linux/device.h> 16 16 #include <linux/devfreq.h> 17 + #include <linux/devfreq-governor.h> 17 18 #include <linux/units.h> 18 - #include "governor.h" 19 + 20 + /** 21 + * struct devfreq_cpu_data - Hold the per-cpu data 22 + * @node: list node 23 + * @dev: reference to cpu device. 24 + * @first_cpu: the cpumask of the first cpu of a policy. 25 + * @opp_table: reference to cpu opp table. 26 + * @cur_freq: the current frequency of the cpu. 27 + * @min_freq: the min frequency of the cpu. 28 + * @max_freq: the max frequency of the cpu. 29 + * 30 + * This structure stores the required cpu_data of a cpu. 31 + * This is auto-populated by the governor. 32 + */ 33 + struct devfreq_cpu_data { 34 + struct list_head node; 35 + 36 + struct device *dev; 37 + unsigned int first_cpu; 38 + 39 + struct opp_table *opp_table; 40 + unsigned int cur_freq; 41 + unsigned int min_freq; 42 + unsigned int max_freq; 43 + }; 19 44 20 45 static struct devfreq_cpu_data * 21 46 get_parent_cpu_data(struct devfreq_passive_data *p_data,

+1 -1

drivers/devfreq/governor_performance.c

··· 7 7 */ 8 8 9 9 #include <linux/devfreq.h> 10 + #include <linux/devfreq-governor.h> 10 11 #include <linux/module.h> 11 - #include "governor.h" 12 12 13 13 static int devfreq_performance_func(struct devfreq *df, 14 14 unsigned long *freq)

+1 -1

drivers/devfreq/governor_powersave.c

··· 7 7 */ 8 8 9 9 #include <linux/devfreq.h> 10 + #include <linux/devfreq-governor.h> 10 11 #include <linux/module.h> 11 - #include "governor.h" 12 12 13 13 static int devfreq_powersave_func(struct devfreq *df, 14 14 unsigned long *freq)

+3 -3

drivers/devfreq/governor_simpleondemand.c

··· 9 9 #include <linux/errno.h> 10 10 #include <linux/module.h> 11 11 #include <linux/devfreq.h> 12 + #include <linux/devfreq-governor.h> 12 13 #include <linux/math64.h> 13 - #include "governor.h" 14 14 15 15 /* Default constants for DevFreq-Simple-Ondemand (DFSO) */ 16 16 #define DFSO_UPTHRESHOLD (90) 17 - #define DFSO_DOWNDIFFERENCTIAL (5) 17 + #define DFSO_DOWNDIFFERENTIAL (5) 18 18 static int devfreq_simple_ondemand_func(struct devfreq *df, 19 19 unsigned long *freq) 20 20 { ··· 22 22 struct devfreq_dev_status *stat; 23 23 unsigned long long a, b; 24 24 unsigned int dfso_upthreshold = DFSO_UPTHRESHOLD; 25 - unsigned int dfso_downdifferential = DFSO_DOWNDIFFERENCTIAL; 25 + unsigned int dfso_downdifferential = DFSO_DOWNDIFFERENTIAL; 26 26 struct devfreq_simple_ondemand_data *data = df->data; 27 27 28 28 err = devfreq_update_stats(df);

+1 -1

drivers/devfreq/governor_userspace.c

··· 9 9 #include <linux/slab.h> 10 10 #include <linux/device.h> 11 11 #include <linux/devfreq.h> 12 + #include <linux/devfreq-governor.h> 12 13 #include <linux/kstrtox.h> 13 14 #include <linux/pm.h> 14 15 #include <linux/mutex.h> 15 16 #include <linux/module.h> 16 - #include "governor.h" 17 17 18 18 struct userspace_data { 19 19 unsigned long user_frequency;

+3 -3

drivers/devfreq/hisi_uncore_freq.c

··· 9 9 #include <linux/bits.h> 10 10 #include <linux/cleanup.h> 11 11 #include <linux/devfreq.h> 12 + #include <linux/devfreq-governor.h> 12 13 #include <linux/device.h> 13 14 #include <linux/dev_printk.h> 14 15 #include <linux/errno.h> ··· 26 25 #include <linux/topology.h> 27 26 #include <linux/units.h> 28 27 #include <acpi/pcc.h> 29 - 30 - #include "governor.h" 31 28 32 29 struct hisi_uncore_pcc_data { 33 30 u16 status; ··· 264 265 dev_err(dev, "Failed to get opp for freq %lu hz\n", *freq); 265 266 return PTR_ERR(opp); 266 267 } 267 - dev_pm_opp_put(opp); 268 268 269 269 data = (u32)(dev_pm_opp_get_freq(opp) / HZ_PER_MHZ); 270 + 271 + dev_pm_opp_put(opp); 270 272 271 273 return hisi_uncore_cmd_send(uncore, HUCF_PCC_CMD_SET_FREQ, &data); 272 274 }

+5 -10

drivers/devfreq/tegra30-devfreq.c

··· 9 9 #include <linux/clk.h> 10 10 #include <linux/cpufreq.h> 11 11 #include <linux/devfreq.h> 12 + #include <linux/devfreq-governor.h> 12 13 #include <linux/interrupt.h> 13 14 #include <linux/io.h> 14 15 #include <linux/irq.h> 16 + #include <linux/minmax.h> 15 17 #include <linux/module.h> 16 18 #include <linux/of.h> 17 19 #include <linux/platform_device.h> ··· 22 20 #include <linux/workqueue.h> 23 21 24 22 #include <soc/tegra/fuse.h> 25 - 26 - #include "governor.h" 27 23 28 24 #define ACTMON_GLB_STATUS 0x0 29 25 #define ACTMON_GLB_PERIOD_CTRL 0x4 ··· 326 326 unsigned int i; 327 327 const struct tegra_actmon_emc_ratio *ratio = actmon_emc_ratios; 328 328 329 - for (i = 0; i < ARRAY_SIZE(actmon_emc_ratios); i++, ratio++) { 330 - if (cpu_freq >= ratio->cpu_freq) { 331 - if (ratio->emc_freq >= tegra->max_freq) 332 - return tegra->max_freq; 333 - else 334 - return ratio->emc_freq; 335 - } 336 - } 329 + for (i = 0; i < ARRAY_SIZE(actmon_emc_ratios); i++, ratio++) 330 + if (cpu_freq >= ratio->cpu_freq) 331 + return min(ratio->emc_freq, tegra->max_freq); 337 332 338 333 return 0; 339 334 }

+38 -31

drivers/opp/core.c

··· 309 309 */ 310 310 unsigned long dev_pm_opp_get_max_clock_latency(struct device *dev) 311 311 { 312 - struct opp_table *opp_table __free(put_opp_table); 312 + struct opp_table *opp_table __free(put_opp_table) = 313 + _find_opp_table(dev); 313 314 314 - opp_table = _find_opp_table(dev); 315 315 if (IS_ERR(opp_table)) 316 316 return 0; 317 317 ··· 327 327 */ 328 328 unsigned long dev_pm_opp_get_max_volt_latency(struct device *dev) 329 329 { 330 - struct opp_table *opp_table __free(put_opp_table); 331 330 struct dev_pm_opp *opp; 332 331 struct regulator *reg; 333 332 unsigned long latency_ns = 0; ··· 336 337 unsigned long max; 337 338 } *uV; 338 339 339 - opp_table = _find_opp_table(dev); 340 + struct opp_table *opp_table __free(put_opp_table) = 341 + _find_opp_table(dev); 342 + 340 343 if (IS_ERR(opp_table)) 341 344 return 0; 342 345 ··· 410 409 */ 411 410 unsigned long dev_pm_opp_get_suspend_opp_freq(struct device *dev) 412 411 { 413 - struct opp_table *opp_table __free(put_opp_table); 414 412 unsigned long freq = 0; 415 413 416 - opp_table = _find_opp_table(dev); 414 + struct opp_table *opp_table __free(put_opp_table) = 415 + _find_opp_table(dev); 416 + 417 417 if (IS_ERR(opp_table)) 418 418 return 0; 419 419 ··· 449 447 */ 450 448 int dev_pm_opp_get_opp_count(struct device *dev) 451 449 { 452 - struct opp_table *opp_table __free(put_opp_table); 450 + struct opp_table *opp_table __free(put_opp_table) = 451 + _find_opp_table(dev); 453 452 454 - opp_table = _find_opp_table(dev); 455 453 if (IS_ERR(opp_table)) { 456 454 dev_dbg(dev, "%s: OPP table not found (%ld)\n", 457 455 __func__, PTR_ERR(opp_table)); ··· 607 605 unsigned long opp_key, unsigned long key), 608 606 bool (*assert)(struct opp_table *opp_table, unsigned int index)) 609 607 { 610 - struct opp_table *opp_table __free(put_opp_table); 608 + struct opp_table *opp_table __free(put_opp_table) = 609 + _find_opp_table(dev); 611 610 612 - opp_table = _find_opp_table(dev); 613 611 if (IS_ERR(opp_table)) { 614 612 dev_err(dev, "%s: OPP table not found (%ld)\n", __func__, 615 613 PTR_ERR(opp_table)); ··· 1412 1410 */ 1413 1411 int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq) 1414 1412 { 1415 - struct opp_table *opp_table __free(put_opp_table); 1416 1413 struct dev_pm_opp *opp __free(put_opp) = NULL; 1417 1414 unsigned long freq = 0, temp_freq; 1418 1415 bool forced = false; 1419 1416 1420 - opp_table = _find_opp_table(dev); 1417 + struct opp_table *opp_table __free(put_opp_table) = 1418 + _find_opp_table(dev); 1419 + 1421 1420 if (IS_ERR(opp_table)) { 1422 1421 dev_err(dev, "%s: device's opp table doesn't exist\n", __func__); 1423 1422 return PTR_ERR(opp_table); ··· 1480 1477 */ 1481 1478 int dev_pm_opp_set_opp(struct device *dev, struct dev_pm_opp *opp) 1482 1479 { 1483 - struct opp_table *opp_table __free(put_opp_table); 1480 + struct opp_table *opp_table __free(put_opp_table) = 1481 + _find_opp_table(dev); 1484 1482 1485 - opp_table = _find_opp_table(dev); 1486 1483 if (IS_ERR(opp_table)) { 1487 1484 dev_err(dev, "%s: device opp doesn't exist\n", __func__); 1488 1485 return PTR_ERR(opp_table); ··· 1797 1794 */ 1798 1795 void dev_pm_opp_remove(struct device *dev, unsigned long freq) 1799 1796 { 1800 - struct opp_table *opp_table __free(put_opp_table); 1801 1797 struct dev_pm_opp *opp = NULL, *iter; 1802 1798 1803 - opp_table = _find_opp_table(dev); 1799 + struct opp_table *opp_table __free(put_opp_table) = 1800 + _find_opp_table(dev); 1801 + 1804 1802 if (IS_ERR(opp_table)) 1805 1803 return; 1806 1804 ··· 1889 1885 */ 1890 1886 void dev_pm_opp_remove_all_dynamic(struct device *dev) 1891 1887 { 1892 - struct opp_table *opp_table __free(put_opp_table); 1888 + struct opp_table *opp_table __free(put_opp_table) = 1889 + _find_opp_table(dev); 1893 1890 1894 - opp_table = _find_opp_table(dev); 1895 1891 if (IS_ERR(opp_table)) 1896 1892 return; 1897 1893 ··· 2875 2871 bool availability_req) 2876 2872 { 2877 2873 struct dev_pm_opp *opp __free(put_opp) = ERR_PTR(-ENODEV), *tmp_opp; 2878 - struct opp_table *opp_table __free(put_opp_table); 2879 2874 2880 2875 /* Find the opp_table */ 2881 - opp_table = _find_opp_table(dev); 2876 + struct opp_table *opp_table __free(put_opp_table) = 2877 + _find_opp_table(dev); 2878 + 2882 2879 if (IS_ERR(opp_table)) { 2883 2880 dev_warn(dev, "%s: Device OPP not found (%ld)\n", __func__, 2884 2881 PTR_ERR(opp_table)); ··· 2937 2932 2938 2933 { 2939 2934 struct dev_pm_opp *opp __free(put_opp) = ERR_PTR(-ENODEV), *tmp_opp; 2940 - struct opp_table *opp_table __free(put_opp_table); 2941 2935 int r; 2942 2936 2943 2937 /* Find the opp_table */ 2944 - opp_table = _find_opp_table(dev); 2938 + struct opp_table *opp_table __free(put_opp_table) = 2939 + _find_opp_table(dev); 2940 + 2945 2941 if (IS_ERR(opp_table)) { 2946 2942 r = PTR_ERR(opp_table); 2947 2943 dev_warn(dev, "%s: Device OPP not found (%d)\n", __func__, r); ··· 2992 2986 */ 2993 2987 int dev_pm_opp_sync_regulators(struct device *dev) 2994 2988 { 2995 - struct opp_table *opp_table __free(put_opp_table); 2996 2989 struct regulator *reg; 2997 2990 int ret, i; 2998 2991 2999 2992 /* Device may not have OPP table */ 3000 - opp_table = _find_opp_table(dev); 2993 + struct opp_table *opp_table __free(put_opp_table) = 2994 + _find_opp_table(dev); 2995 + 3001 2996 if (IS_ERR(opp_table)) 3002 2997 return 0; 3003 2998 ··· 3069 3062 */ 3070 3063 int dev_pm_opp_register_notifier(struct device *dev, struct notifier_block *nb) 3071 3064 { 3072 - struct opp_table *opp_table __free(put_opp_table); 3065 + struct opp_table *opp_table __free(put_opp_table) = 3066 + _find_opp_table(dev); 3073 3067 3074 - opp_table = _find_opp_table(dev); 3075 3068 if (IS_ERR(opp_table)) 3076 3069 return PTR_ERR(opp_table); 3077 3070 ··· 3089 3082 int dev_pm_opp_unregister_notifier(struct device *dev, 3090 3083 struct notifier_block *nb) 3091 3084 { 3092 - struct opp_table *opp_table __free(put_opp_table); 3085 + struct opp_table *opp_table __free(put_opp_table) = 3086 + _find_opp_table(dev); 3093 3087 3094 - opp_table = _find_opp_table(dev); 3095 3088 if (IS_ERR(opp_table)) 3096 3089 return PTR_ERR(opp_table); 3097 3090 ··· 3108 3101 */ 3109 3102 void dev_pm_opp_remove_table(struct device *dev) 3110 3103 { 3111 - struct opp_table *opp_table __free(put_opp_table); 3112 - 3113 3104 /* Check for existing table for 'dev' */ 3114 - opp_table = _find_opp_table(dev); 3105 + struct opp_table *opp_table __free(put_opp_table) = 3106 + _find_opp_table(dev); 3107 + 3115 3108 if (IS_ERR(opp_table)) { 3116 3109 int error = PTR_ERR(opp_table); 3117 3110

+9 -7

drivers/opp/cpu.c

··· 56 56 return -ENOMEM; 57 57 58 58 for (i = 0, rate = 0; i < max_opps; i++, rate++) { 59 - struct dev_pm_opp *opp __free(put_opp); 60 - 61 59 /* find next rate */ 62 - opp = dev_pm_opp_find_freq_ceil(dev, &rate); 60 + struct dev_pm_opp *opp __free(put_opp) = 61 + dev_pm_opp_find_freq_ceil(dev, &rate); 62 + 63 63 if (IS_ERR(opp)) { 64 64 ret = PTR_ERR(opp); 65 65 goto out; ··· 154 154 int dev_pm_opp_set_sharing_cpus(struct device *cpu_dev, 155 155 const struct cpumask *cpumask) 156 156 { 157 - struct opp_table *opp_table __free(put_opp_table); 158 157 struct opp_device *opp_dev; 159 158 struct device *dev; 160 159 int cpu; 161 160 162 - opp_table = _find_opp_table(cpu_dev); 161 + struct opp_table *opp_table __free(put_opp_table) = 162 + _find_opp_table(cpu_dev); 163 + 163 164 if (IS_ERR(opp_table)) 164 165 return PTR_ERR(opp_table); 165 166 ··· 202 201 */ 203 202 int dev_pm_opp_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask) 204 203 { 205 - struct opp_table *opp_table __free(put_opp_table); 206 204 struct opp_device *opp_dev; 207 205 208 - opp_table = _find_opp_table(cpu_dev); 206 + struct opp_table *opp_table __free(put_opp_table) = 207 + _find_opp_table(cpu_dev); 208 + 209 209 if (IS_ERR(opp_table)) 210 210 return PTR_ERR(opp_table); 211 211

+70 -55

drivers/opp/of.c

··· 45 45 struct opp_table *_managed_opp(struct device *dev, int index) 46 46 { 47 47 struct opp_table *opp_table, *managed_table = NULL; 48 - struct device_node *np __free(device_node); 49 48 50 - np = _opp_of_get_opp_desc_node(dev->of_node, index); 49 + struct device_node *np __free(device_node) = 50 + _opp_of_get_opp_desc_node(dev->of_node, index); 51 + 51 52 if (!np) 52 53 return NULL; 53 54 ··· 96 95 /* The caller must call dev_pm_opp_put_opp_table() after the table is used */ 97 96 static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np) 98 97 { 99 - struct device_node *opp_table_np __free(device_node); 100 98 struct opp_table *opp_table; 101 99 102 - opp_table_np = of_get_parent(opp_np); 100 + struct device_node *opp_table_np __free(device_node) = 101 + of_get_parent(opp_np); 102 + 103 103 if (!opp_table_np) 104 104 return ERR_PTR(-ENODEV); 105 105 ··· 148 146 struct device_node *opp_np) 149 147 { 150 148 struct opp_table **required_opp_tables; 151 - struct device_node *np __free(device_node); 152 149 bool lazy = false; 153 150 int count, i, size; 154 151 155 152 /* Traversing the first OPP node is all we need */ 156 - np = of_get_next_available_child(opp_np, NULL); 153 + struct device_node *np __free(device_node) = 154 + of_get_next_available_child(opp_np, NULL); 155 + 157 156 if (!np) { 158 157 dev_warn(dev, "Empty OPP table\n"); 159 158 return; ··· 174 171 opp_table->required_opp_count = count; 175 172 176 173 for (i = 0; i < count; i++) { 177 - struct device_node *required_np __free(device_node); 174 + struct device_node *required_np __free(device_node) = 175 + of_parse_required_opp(np, i); 178 176 179 - required_np = of_parse_required_opp(np, i); 180 177 if (!required_np) { 181 178 _opp_table_free_required_tables(opp_table); 182 179 return; ··· 202 199 void _of_init_opp_table(struct opp_table *opp_table, struct device *dev, 203 200 int index) 204 201 { 205 - struct device_node *np __free(device_node), *opp_np; 202 + struct device_node *opp_np; 206 203 u32 val; 207 204 208 205 /* 209 206 * Only required for backward compatibility with v1 bindings, but isn't 210 207 * harmful for other cases. And so we do it unconditionally. 211 208 */ 212 - np = of_node_get(dev->of_node); 209 + struct device_node *np __free(device_node) = of_node_get(dev->of_node); 210 + 213 211 if (!np) 214 212 return; 215 213 ··· 277 273 static int _link_required_opps(struct dev_pm_opp *opp, 278 274 struct opp_table *required_table, int index) 279 275 { 280 - struct device_node *np __free(device_node); 276 + struct device_node *np __free(device_node) = 277 + of_parse_required_opp(opp->np, index); 281 278 282 - np = of_parse_required_opp(opp->np, index); 283 279 if (unlikely(!np)) 284 280 return -ENODEV; 285 281 ··· 353 349 guard(mutex)(&opp_table_lock); 354 350 355 351 list_for_each_entry_safe(opp_table, temp, &lazy_opp_tables, lazy) { 356 - struct device_node *opp_np __free(device_node); 357 352 bool lazy = false; 358 353 359 354 /* opp_np can't be invalid here */ 360 - opp_np = of_get_next_available_child(opp_table->np, NULL); 355 + struct device_node *opp_np __free(device_node) = 356 + of_get_next_available_child(opp_table->np, NULL); 361 357 362 358 for (i = 0; i < opp_table->required_opp_count; i++) { 363 - struct device_node *required_np __free(device_node) = NULL; 364 - struct device_node *required_table_np __free(device_node) = NULL; 365 - 366 359 required_opp_tables = opp_table->required_opp_tables; 367 360 368 361 /* Required opp-table is already parsed */ ··· 367 366 continue; 368 367 369 368 /* required_np can't be invalid here */ 370 - required_np = of_parse_required_opp(opp_np, i); 371 - required_table_np = of_get_parent(required_np); 369 + struct device_node *required_np __free(device_node) = 370 + of_parse_required_opp(opp_np, i); 371 + struct device_node *required_table_np __free(device_node) = 372 + of_get_parent(required_np); 372 373 373 374 /* 374 375 * Newly added table isn't the required opp-table for ··· 405 402 static int _bandwidth_supported(struct device *dev, struct opp_table *opp_table) 406 403 { 407 404 struct device_node *opp_np __free(device_node) = NULL; 408 - struct device_node *np __free(device_node) = NULL; 409 405 struct property *prop; 410 406 411 407 if (!opp_table) { 412 - struct device_node *np __free(device_node); 408 + struct device_node *np __free(device_node) = 409 + of_node_get(dev->of_node); 413 410 414 - np = of_node_get(dev->of_node); 415 411 if (!np) 416 412 return -ENODEV; 417 413 ··· 424 422 return 0; 425 423 426 424 /* Checking only first OPP is sufficient */ 427 - np = of_get_next_available_child(opp_np, NULL); 425 + struct device_node *np __free(device_node) = 426 + of_get_next_available_child(opp_np, NULL); 427 + 428 428 if (!np) { 429 429 dev_err(dev, "OPP table empty\n"); 430 430 return -EINVAL; ··· 1273 1269 int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, 1274 1270 struct cpumask *cpumask) 1275 1271 { 1276 - struct device_node *np __free(device_node); 1277 1272 int cpu; 1278 1273 1279 1274 /* Get OPP descriptor node */ 1280 - np = dev_pm_opp_of_get_opp_desc_node(cpu_dev); 1275 + struct device_node *np __free(device_node) = 1276 + dev_pm_opp_of_get_opp_desc_node(cpu_dev); 1277 + 1281 1278 if (!np) { 1282 1279 dev_dbg(cpu_dev, "%s: Couldn't find opp node.\n", __func__); 1283 1280 return -ENOENT; ··· 1291 1286 return 0; 1292 1287 1293 1288 for_each_possible_cpu(cpu) { 1294 - struct device_node *cpu_np __free(device_node) = NULL; 1295 - struct device_node *tmp_np __free(device_node) = NULL; 1296 - 1297 1289 if (cpu == cpu_dev->id) 1298 1290 continue; 1299 1291 1300 - cpu_np = of_cpu_device_node_get(cpu); 1292 + struct device_node *cpu_np __free(device_node) = 1293 + of_cpu_device_node_get(cpu); 1294 + 1301 1295 if (!cpu_np) { 1302 1296 dev_err(cpu_dev, "%s: failed to get cpu%d node\n", 1303 1297 __func__, cpu); ··· 1304 1300 } 1305 1301 1306 1302 /* Get OPP descriptor node */ 1307 - tmp_np = _opp_of_get_opp_desc_node(cpu_np, 0); 1303 + struct device_node *tmp_np __free(device_node) = 1304 + _opp_of_get_opp_desc_node(cpu_np, 0); 1305 + 1308 1306 if (!tmp_np) { 1309 1307 pr_err("%pOF: Couldn't find opp node\n", cpu_np); 1310 1308 return -ENOENT; ··· 1334 1328 */ 1335 1329 int of_get_required_opp_performance_state(struct device_node *np, int index) 1336 1330 { 1337 - struct device_node *required_np __free(device_node); 1338 - struct opp_table *opp_table __free(put_opp_table) = NULL; 1339 - struct dev_pm_opp *opp __free(put_opp) = NULL; 1340 1331 int pstate = -EINVAL; 1341 1332 1342 - required_np = of_parse_required_opp(np, index); 1333 + struct device_node *required_np __free(device_node) = 1334 + of_parse_required_opp(np, index); 1335 + 1343 1336 if (!required_np) 1344 1337 return -ENODEV; 1345 1338 1346 - opp_table = _find_table_of_opp_np(required_np); 1339 + struct opp_table *opp_table __free(put_opp_table) = 1340 + _find_table_of_opp_np(required_np); 1341 + 1347 1342 if (IS_ERR(opp_table)) { 1348 1343 pr_err("%s: Failed to find required OPP table %pOF: %ld\n", 1349 1344 __func__, np, PTR_ERR(opp_table)); ··· 1357 1350 return -EINVAL; 1358 1351 } 1359 1352 1360 - opp = _find_opp_of_np(opp_table, required_np); 1353 + struct dev_pm_opp *opp __free(put_opp) = 1354 + _find_opp_of_np(opp_table, required_np); 1355 + 1361 1356 if (opp) { 1362 1357 if (opp->level == OPP_LEVEL_UNSET) { 1363 1358 pr_err("%s: OPP levels aren't available for %pOF\n", ··· 1385 1376 */ 1386 1377 bool dev_pm_opp_of_has_required_opp(struct device *dev) 1387 1378 { 1388 - struct device_node *np __free(device_node) = NULL, *opp_np __free(device_node); 1389 1379 int count; 1390 1380 1391 - opp_np = _opp_of_get_opp_desc_node(dev->of_node, 0); 1381 + struct device_node *opp_np __free(device_node) = 1382 + _opp_of_get_opp_desc_node(dev->of_node, 0); 1383 + 1392 1384 if (!opp_np) 1393 1385 return false; 1394 1386 1395 - np = of_get_next_available_child(opp_np, NULL); 1387 + struct device_node *np __free(device_node) = 1388 + of_get_next_available_child(opp_np, NULL); 1389 + 1396 1390 if (!np) { 1397 1391 dev_warn(dev, "Empty OPP table\n"); 1398 1392 return false; ··· 1437 1425 static int __maybe_unused 1438 1426 _get_dt_power(struct device *dev, unsigned long *uW, unsigned long *kHz) 1439 1427 { 1440 - struct dev_pm_opp *opp __free(put_opp); 1441 1428 unsigned long opp_freq, opp_power; 1442 1429 1443 1430 /* Find the right frequency and related OPP */ 1444 1431 opp_freq = *kHz * 1000; 1445 - opp = dev_pm_opp_find_freq_ceil(dev, &opp_freq); 1432 + 1433 + struct dev_pm_opp *opp __free(put_opp) = 1434 + dev_pm_opp_find_freq_ceil(dev, &opp_freq); 1435 + 1446 1436 if (IS_ERR(opp)) 1447 1437 return -EINVAL; 1448 1438 ··· 1479 1465 int dev_pm_opp_calc_power(struct device *dev, unsigned long *uW, 1480 1466 unsigned long *kHz) 1481 1467 { 1482 - struct dev_pm_opp *opp __free(put_opp) = NULL; 1483 - struct device_node *np __free(device_node); 1484 1468 unsigned long mV, Hz; 1485 1469 u32 cap; 1486 1470 u64 tmp; 1487 1471 int ret; 1488 1472 1489 - np = of_node_get(dev->of_node); 1473 + struct device_node *np __free(device_node) = of_node_get(dev->of_node); 1474 + 1490 1475 if (!np) 1491 1476 return -EINVAL; 1492 1477 ··· 1494 1481 return -EINVAL; 1495 1482 1496 1483 Hz = *kHz * 1000; 1497 - opp = dev_pm_opp_find_freq_ceil(dev, &Hz); 1484 + 1485 + struct dev_pm_opp *opp __free(put_opp) = 1486 + dev_pm_opp_find_freq_ceil(dev, &Hz); 1487 + 1498 1488 if (IS_ERR(opp)) 1499 1489 return -EINVAL; 1500 1490 ··· 1518 1502 1519 1503 static bool _of_has_opp_microwatt_property(struct device *dev) 1520 1504 { 1521 - struct dev_pm_opp *opp __free(put_opp); 1522 1505 unsigned long freq = 0; 1523 1506 1524 1507 /* Check if at least one OPP has needed property */ 1525 - opp = dev_pm_opp_find_freq_ceil(dev, &freq); 1508 + struct dev_pm_opp *opp __free(put_opp) = 1509 + dev_pm_opp_find_freq_ceil(dev, &freq); 1510 + 1526 1511 if (IS_ERR(opp)) 1527 1512 return false; 1528 1513 ··· 1543 1526 */ 1544 1527 int dev_pm_opp_of_register_em(struct device *dev, struct cpumask *cpus) 1545 1528 { 1546 - struct device_node *np __free(device_node) = NULL; 1547 1529 struct em_data_callback em_cb; 1548 1530 int ret, nr_opp; 1549 1531 u32 cap; 1550 1532 1551 - if (IS_ERR_OR_NULL(dev)) { 1533 + if (IS_ERR_OR_NULL(dev)) 1534 + return -EINVAL; 1535 + 1536 + struct device_node *np __free(device_node) = of_node_get(dev->of_node); 1537 + 1538 + if (!np) { 1552 1539 ret = -EINVAL; 1553 1540 goto failed; 1554 1541 } ··· 1567 1546 if (_of_has_opp_microwatt_property(dev)) { 1568 1547 EM_SET_ACTIVE_POWER_CB(em_cb, _get_dt_power); 1569 1548 goto register_em; 1570 - } 1571 - 1572 - np = of_node_get(dev->of_node); 1573 - if (!np) { 1574 - ret = -EINVAL; 1575 - goto failed; 1576 1549 } 1577 1550 1578 1551 /*

+2 -2

drivers/pci/pci-sysfs.c

··· 1517 1517 return count; 1518 1518 } 1519 1519 1520 - ACQUIRE(pm_runtime_active_try, pm)(dev); 1521 - if (ACQUIRE_ERR(pm_runtime_active_try, &pm)) 1520 + PM_RUNTIME_ACQUIRE(dev, pm); 1521 + if (PM_RUNTIME_ACQUIRE_ERR(&pm)) 1522 1522 return -ENXIO; 1523 1523 1524 1524 if (sysfs_streq(buf, "default")) {

+8 -2

drivers/pmdomain/core.c

··· 1425 1425 return; 1426 1426 } 1427 1427 1428 - /* Choose the deepest state when suspending */ 1429 - genpd->state_idx = genpd->state_count - 1; 1428 + if (genpd->gov && genpd->gov->system_power_down_ok) { 1429 + if (!genpd->gov->system_power_down_ok(&genpd->domain)) 1430 + return; 1431 + } else { 1432 + /* Default to the deepest state. */ 1433 + genpd->state_idx = genpd->state_count - 1; 1434 + } 1435 + 1430 1436 if (_genpd_power_off(genpd, false)) { 1431 1437 genpd->states[genpd->state_idx].rejected++; 1432 1438 return;

+32 -1

drivers/pmdomain/governor.c

··· 351 351 ktime_t domain_wakeup, next_hrtimer; 352 352 ktime_t now = ktime_get(); 353 353 struct device *cpu_dev; 354 - s64 cpu_constraint, global_constraint; 354 + s64 cpu_constraint, global_constraint, wakeup_constraint; 355 355 s64 idle_duration_ns; 356 356 int cpu, i; 357 357 ··· 362 362 if (!(genpd->flags & GENPD_FLAG_CPU_DOMAIN)) 363 363 return true; 364 364 365 + wakeup_constraint = cpu_wakeup_latency_qos_limit(); 365 366 global_constraint = cpu_latency_qos_limit(); 367 + if (global_constraint > wakeup_constraint) 368 + global_constraint = wakeup_constraint; 369 + 366 370 /* 367 371 * Find the next wakeup for any of the online CPUs within the PM domain 368 372 * and its subdomains. Note, we only need the genpd->cpus, as it already ··· 419 415 return false; 420 416 } 421 417 418 + static bool cpu_system_power_down_ok(struct dev_pm_domain *pd) 419 + { 420 + s64 constraint_ns = cpu_wakeup_latency_qos_limit() * NSEC_PER_USEC; 421 + struct generic_pm_domain *genpd = pd_to_genpd(pd); 422 + int state_idx = genpd->state_count - 1; 423 + 424 + if (!(genpd->flags & GENPD_FLAG_CPU_DOMAIN)) { 425 + genpd->state_idx = state_idx; 426 + return true; 427 + } 428 + 429 + /* Find the deepest state for the latency constraint. */ 430 + while (state_idx >= 0) { 431 + s64 latency_ns = genpd->states[state_idx].power_off_latency_ns + 432 + genpd->states[state_idx].power_on_latency_ns; 433 + 434 + if (latency_ns <= constraint_ns) { 435 + genpd->state_idx = state_idx; 436 + return true; 437 + } 438 + state_idx--; 439 + } 440 + 441 + return false; 442 + } 443 + 422 444 struct dev_power_governor pm_domain_cpu_gov = { 423 445 .suspend_ok = default_suspend_ok, 424 446 .power_down_ok = cpu_power_down_ok, 447 + .system_power_down_ok = cpu_system_power_down_ok, 425 448 }; 426 449 #endif 427 450

+22 -17

drivers/powercap/intel_rapl_common.c

··· 253 253 static void rapl_init_domains(struct rapl_package *rp); 254 254 static int rapl_read_data_raw(struct rapl_domain *rd, 255 255 enum rapl_primitives prim, 256 - bool xlate, u64 *data); 256 + bool xlate, u64 *data, 257 + bool atomic); 257 258 static int rapl_write_data_raw(struct rapl_domain *rd, 258 259 enum rapl_primitives prim, 259 260 unsigned long long value); ··· 290 289 cpus_read_lock(); 291 290 rd = power_zone_to_rapl_domain(power_zone); 292 291 293 - if (!rapl_read_data_raw(rd, ENERGY_COUNTER, true, &energy_now)) { 292 + if (!rapl_read_data_raw(rd, ENERGY_COUNTER, true, &energy_now, false)) { 294 293 *energy_raw = energy_now; 295 294 cpus_read_unlock(); 296 295 ··· 831 830 * 63-------------------------- 31--------------------------- 0 832 831 */ 833 832 static int rapl_read_data_raw(struct rapl_domain *rd, 834 - enum rapl_primitives prim, bool xlate, u64 *data) 833 + enum rapl_primitives prim, bool xlate, u64 *data, 834 + bool atomic) 835 835 { 836 836 u64 value; 837 837 enum rapl_primitives prim_fixed = prim_fixups(rd, prim); ··· 854 852 855 853 ra.mask = rpi->mask; 856 854 857 - if (rd->rp->priv->read_raw(get_rid(rd->rp), &ra)) { 855 + if (rd->rp->priv->read_raw(get_rid(rd->rp), &ra, atomic)) { 858 856 pr_debug("failed to read reg 0x%llx for %s:%s\n", ra.reg.val, rd->rp->name, rd->name); 859 857 return -EIO; 860 858 } ··· 906 904 if (!is_pl_valid(rd, pl)) 907 905 return -EINVAL; 908 906 909 - return rapl_read_data_raw(rd, prim, xlate, data); 907 + return rapl_read_data_raw(rd, prim, xlate, data, false); 910 908 } 911 909 912 910 static int rapl_write_pl_data(struct rapl_domain *rd, int pl, ··· 943 941 944 942 ra.reg = rd->regs[RAPL_DOMAIN_REG_UNIT]; 945 943 ra.mask = ~0; 946 - if (rd->rp->priv->read_raw(get_rid(rd->rp), &ra)) { 944 + if (rd->rp->priv->read_raw(get_rid(rd->rp), &ra, false)) { 947 945 pr_err("Failed to read power unit REG 0x%llx on %s:%s, exit.\n", 948 946 ra.reg.val, rd->rp->name, rd->name); 949 947 return -ENODEV; ··· 971 969 972 970 ra.reg = rd->regs[RAPL_DOMAIN_REG_UNIT]; 973 971 ra.mask = ~0; 974 - if (rd->rp->priv->read_raw(get_rid(rd->rp), &ra)) { 972 + if (rd->rp->priv->read_raw(get_rid(rd->rp), &ra, false)) { 975 973 pr_err("Failed to read power unit REG 0x%llx on %s:%s, exit.\n", 976 974 ra.reg.val, rd->rp->name, rd->name); 977 975 return -ENODEV; ··· 1158 1156 1159 1157 ra.reg = rd->regs[RAPL_DOMAIN_REG_UNIT]; 1160 1158 ra.mask = ~0; 1161 - if (rd->rp->priv->read_raw(get_rid(rd->rp), &ra)) { 1159 + if (rd->rp->priv->read_raw(get_rid(rd->rp), &ra, false)) { 1162 1160 pr_err("Failed to read power unit REG 0x%llx on %s:%s, exit.\n", 1163 1161 ra.reg.val, rd->rp->name, rd->name); 1164 1162 return -ENODEV; ··· 1286 1284 X86_MATCH_VFM(INTEL_EMERALDRAPIDS_X, &rapl_defaults_spr_server), 1287 1285 X86_MATCH_VFM(INTEL_LUNARLAKE_M, &rapl_defaults_core), 1288 1286 X86_MATCH_VFM(INTEL_PANTHERLAKE_L, &rapl_defaults_core), 1287 + X86_MATCH_VFM(INTEL_WILDCATLAKE_L, &rapl_defaults_core), 1288 + X86_MATCH_VFM(INTEL_NOVALAKE, &rapl_defaults_core), 1289 + X86_MATCH_VFM(INTEL_NOVALAKE_L, &rapl_defaults_core), 1289 1290 X86_MATCH_VFM(INTEL_ARROWLAKE_H, &rapl_defaults_core), 1290 1291 X86_MATCH_VFM(INTEL_ARROWLAKE, &rapl_defaults_core), 1291 1292 X86_MATCH_VFM(INTEL_ARROWLAKE_U, &rapl_defaults_core), ··· 1330 1325 struct rapl_primitive_info *rpi = get_rpi(rp, prim); 1331 1326 1332 1327 if (!rapl_read_data_raw(&rp->domains[dmn], prim, 1333 - rpi->unit, &val)) 1328 + rpi->unit, &val, false)) 1334 1329 rp->domains[dmn].rdd.primitives[prim] = val; 1335 1330 } 1336 1331 } ··· 1430 1425 */ 1431 1426 1432 1427 ra.mask = ENERGY_STATUS_MASK; 1433 - if (rp->priv->read_raw(get_rid(rp), &ra) || !ra.value) 1428 + if (rp->priv->read_raw(get_rid(rp), &ra, false) || !ra.value) 1434 1429 return -ENODEV; 1435 1430 1436 1431 return 0; ··· 1597 1592 if (!rp->has_pmu) 1598 1593 return nr_cpu_ids; 1599 1594 1600 - /* Only TPMI RAPL is supported for now */ 1601 - if (rp->priv->type != RAPL_IF_TPMI) 1595 + /* Only TPMI & MSR RAPL are supported for now */ 1596 + if (rp->priv->type != RAPL_IF_TPMI && rp->priv->type != RAPL_IF_MSR) 1602 1597 return nr_cpu_ids; 1603 1598 1604 - /* TPMI RAPL uses any CPU in the package for PMU */ 1599 + /* TPMI/MSR RAPL uses any CPU in the package for PMU */ 1605 1600 for_each_online_cpu(cpu) 1606 1601 if (topology_physical_package_id(cpu) == rp->id) 1607 1602 return cpu; ··· 1614 1609 if (!rp->has_pmu) 1615 1610 return false; 1616 1611 1617 - /* Only TPMI RAPL is supported for now */ 1618 - if (rp->priv->type != RAPL_IF_TPMI) 1612 + /* Only TPMI & MSR RAPL are supported for now */ 1613 + if (rp->priv->type != RAPL_IF_TPMI && rp->priv->type != RAPL_IF_MSR) 1619 1614 return false; 1620 1615 1621 - /* TPMI RAPL uses any CPU in the package for PMU */ 1616 + /* TPMI/MSR RAPL uses any CPU in the package for PMU */ 1622 1617 return topology_physical_package_id(cpu) == rp->id; 1623 1618 } 1624 1619 ··· 1641 1636 if (event->hw.idx < 0) 1642 1637 return 0; 1643 1638 1644 - ret = rapl_read_data_raw(&rp->domains[event->hw.idx], ENERGY_COUNTER, false, &val); 1639 + ret = rapl_read_data_raw(&rp->domains[event->hw.idx], ENERGY_COUNTER, false, &val, true); 1645 1640 1646 1641 /* Return 0 for failed read */ 1647 1642 if (ret)

+40 -3

drivers/powercap/intel_rapl_msr.c

··· 33 33 /* private data for RAPL MSR Interface */ 34 34 static struct rapl_if_priv *rapl_msr_priv; 35 35 36 + static bool rapl_msr_pmu __ro_after_init; 37 + 36 38 static struct rapl_if_priv rapl_msr_priv_intel = { 37 39 .type = RAPL_IF_MSR, 38 40 .reg_unit.msr = MSR_RAPL_POWER_UNIT, ··· 81 79 rp = rapl_add_package_cpuslocked(cpu, rapl_msr_priv, true); 82 80 if (IS_ERR(rp)) 83 81 return PTR_ERR(rp); 82 + if (rapl_msr_pmu) 83 + rapl_package_add_pmu(rp); 84 84 } 85 85 cpumask_set_cpu(cpu, &rp->cpumask); 86 86 return 0; ··· 99 95 100 96 cpumask_clear_cpu(cpu, &rp->cpumask); 101 97 lead_cpu = cpumask_first(&rp->cpumask); 102 - if (lead_cpu >= nr_cpu_ids) 98 + if (lead_cpu >= nr_cpu_ids) { 99 + if (rapl_msr_pmu) 100 + rapl_package_remove_pmu(rp); 103 101 rapl_remove_package_cpuslocked(rp); 104 - else if (rp->lead_cpu == cpu) 102 + } else if (rp->lead_cpu == cpu) { 105 103 rp->lead_cpu = lead_cpu; 104 + } 105 + 106 106 return 0; 107 107 } 108 108 109 - static int rapl_msr_read_raw(int cpu, struct reg_action *ra) 109 + static int rapl_msr_read_raw(int cpu, struct reg_action *ra, bool atomic) 110 110 { 111 + /* 112 + * When called from atomic-context (eg PMU event handler) 113 + * perform MSR read directly using rdmsrq(). 114 + */ 115 + if (atomic) { 116 + if (unlikely(smp_processor_id() != cpu)) 117 + return -EIO; 118 + 119 + rdmsrq(ra->reg.msr, ra->value); 120 + goto out; 121 + } 122 + 111 123 if (rdmsrq_safe_on_cpu(cpu, ra->reg.msr, &ra->value)) { 112 124 pr_debug("failed to read msr 0x%x on cpu %d\n", ra->reg.msr, cpu); 113 125 return -EIO; 114 126 } 127 + 128 + out: 115 129 ra->value &= ra->mask; 116 130 return 0; 117 131 } ··· 173 151 X86_MATCH_VFM(INTEL_ARROWLAKE_U, NULL), 174 152 X86_MATCH_VFM(INTEL_ARROWLAKE_H, NULL), 175 153 X86_MATCH_VFM(INTEL_PANTHERLAKE_L, NULL), 154 + X86_MATCH_VFM(INTEL_WILDCATLAKE_L, NULL), 155 + X86_MATCH_VFM(INTEL_NOVALAKE, NULL), 156 + X86_MATCH_VFM(INTEL_NOVALAKE_L, NULL), 157 + {} 158 + }; 159 + 160 + /* List of MSR-based RAPL PMU support CPUs */ 161 + static const struct x86_cpu_id pmu_support_ids[] = { 162 + X86_MATCH_VFM(INTEL_PANTHERLAKE_L, NULL), 163 + X86_MATCH_VFM(INTEL_WILDCATLAKE_L, NULL), 176 164 {} 177 165 }; 178 166 ··· 211 179 rapl_msr_priv->regs[RAPL_DOMAIN_PACKAGE][RAPL_DOMAIN_REG_PL4].msr = 212 180 MSR_VR_CURRENT_CONFIG; 213 181 pr_info("PL4 support detected.\n"); 182 + } 183 + 184 + if (x86_match_cpu(pmu_support_ids)) { 185 + rapl_msr_pmu = true; 186 + pr_info("MSR-based RAPL PMU support enabled\n"); 214 187 } 215 188 216 189 rapl_msr_priv->control_type = powercap_register_control_type(NULL, "intel-rapl", NULL);

+1 -1

drivers/powercap/intel_rapl_tpmi.c

··· 60 60 61 61 static struct powercap_control_type *tpmi_control_type; 62 62 63 - static int tpmi_rapl_read_raw(int id, struct reg_action *ra) 63 + static int tpmi_rapl_read_raw(int id, struct reg_action *ra, bool atomic) 64 64 { 65 65 if (!ra->reg.mmio) 66 66 return -EINVAL;

+1

drivers/scsi/mesh.c

··· 1762 1762 case PM_EVENT_SUSPEND: 1763 1763 case PM_EVENT_HIBERNATE: 1764 1764 case PM_EVENT_FREEZE: 1765 + case PM_EVENT_POWEROFF: 1765 1766 break; 1766 1767 default: 1767 1768 return 0;

+1

drivers/scsi/stex.c

··· 1965 1965 case PM_EVENT_SUSPEND: 1966 1966 return ST_S3; 1967 1967 case PM_EVENT_HIBERNATE: 1968 + case PM_EVENT_POWEROFF: 1968 1969 hba->msi_lock = 0; 1969 1970 return ST_S4; 1970 1971 default:

+1 -1

drivers/thermal/intel/int340x_thermal/processor_thermal_rapl.c

··· 19 19 .limits[RAPL_DOMAIN_DRAM] = BIT(POWER_LIMIT2), 20 20 }; 21 21 22 - static int rapl_mmio_read_raw(int cpu, struct reg_action *ra) 22 + static int rapl_mmio_read_raw(int cpu, struct reg_action *ra, bool atomic) 23 23 { 24 24 if (!ra->reg.mmio) 25 25 return -EINVAL;

+1

drivers/usb/host/sl811-hcd.c

··· 1748 1748 break; 1749 1749 case PM_EVENT_SUSPEND: 1750 1750 case PM_EVENT_HIBERNATE: 1751 + case PM_EVENT_POWEROFF: 1751 1752 case PM_EVENT_PRETHAW: /* explicitly discard hw state */ 1752 1753 port_power(sl811, 0); 1753 1754 break;

+4 -2

include/linux/cpuidle.h

··· 248 248 struct cpuidle_device *dev, 249 249 u64 latency_limit_ns); 250 250 extern int cpuidle_enter_s2idle(struct cpuidle_driver *drv, 251 - struct cpuidle_device *dev); 251 + struct cpuidle_device *dev, 252 + u64 latency_limit_ns); 252 253 extern void cpuidle_use_deepest_state(u64 latency_limit_ns); 253 254 #else 254 255 static inline int cpuidle_find_deepest_state(struct cpuidle_driver *drv, ··· 257 256 u64 latency_limit_ns) 258 257 {return -ENODEV; } 259 258 static inline int cpuidle_enter_s2idle(struct cpuidle_driver *drv, 260 - struct cpuidle_device *dev) 259 + struct cpuidle_device *dev, 260 + u64 latency_limit_ns) 261 261 {return -ENODEV; } 262 262 static inline void cpuidle_use_deepest_state(u64 latency_limit_ns) 263 263 {

+4

include/linux/energy_model.h

··· 54 54 /** 55 55 * struct em_perf_domain - Performance domain 56 56 * @em_table: Pointer to the runtime modifiable em_perf_table 57 + * @node: node in em_pd_list (in energy_model.c) 58 + * @id: A unique ID number for each performance domain 57 59 * @nr_perf_states: Number of performance states 58 60 * @min_perf_state: Minimum allowed Performance State index 59 61 * @max_perf_state: Maximum allowed Performance State index ··· 73 71 */ 74 72 struct em_perf_domain { 75 73 struct em_perf_table __rcu *em_table; 74 + struct list_head node; 75 + int id; 76 76 int nr_perf_states; 77 77 int min_perf_state; 78 78 int max_perf_state;

+8 -4

include/linux/freezer.h

··· 22 22 extern unsigned int freeze_timeout_msecs; 23 23 24 24 /* 25 - * Check if a process has been frozen 25 + * Check if a process has been frozen for PM or cgroup1 freezer. Note that 26 + * cgroup2 freezer uses the job control mechanism and does not interact with 27 + * the PM freezer. 26 28 */ 27 29 extern bool frozen(struct task_struct *p); 28 30 29 31 extern bool freezing_slow_path(struct task_struct *p); 30 32 31 33 /* 32 - * Check if there is a request to freeze a process 34 + * Check if there is a request to freeze a task from PM or cgroup1 freezer. 35 + * Note that cgroup2 freezer uses the job control mechanism and does not 36 + * interact with the PM freezer. 33 37 */ 34 38 static inline bool freezing(struct task_struct *p) 35 39 { ··· 67 63 extern bool set_freezable(void); 68 64 69 65 #ifdef CONFIG_CGROUP_FREEZER 70 - extern bool cgroup_freezing(struct task_struct *task); 66 + extern bool cgroup1_freezing(struct task_struct *task); 71 67 #else /* !CONFIG_CGROUP_FREEZER */ 72 - static inline bool cgroup_freezing(struct task_struct *task) 68 + static inline bool cgroup1_freezing(struct task_struct *task) 73 69 { 74 70 return false; 75 71 }

+1 -1

include/linux/intel_rapl.h

··· 152 152 union rapl_reg reg_unit; 153 153 union rapl_reg regs[RAPL_DOMAIN_MAX][RAPL_DOMAIN_REG_MAX]; 154 154 int limits[RAPL_DOMAIN_MAX]; 155 - int (*read_raw)(int id, struct reg_action *ra); 155 + int (*read_raw)(int id, struct reg_action *ra, bool atomic); 156 156 int (*write_raw)(int id, struct reg_action *ra); 157 157 void *defaults; 158 158 void *rpi;

+6 -2

include/linux/pm.h

··· 25 25 26 26 struct device; /* we have a circular dep with device.h */ 27 27 #ifdef CONFIG_VT_CONSOLE_SLEEP 28 - extern void pm_vt_switch_required(struct device *dev, bool required); 28 + extern int pm_vt_switch_required(struct device *dev, bool required); 29 29 extern void pm_vt_switch_unregister(struct device *dev); 30 30 #else 31 - static inline void pm_vt_switch_required(struct device *dev, bool required) 31 + static inline int pm_vt_switch_required(struct device *dev, bool required) 32 32 { 33 + return 0; 33 34 } 34 35 static inline void pm_vt_switch_unregister(struct device *dev) 35 36 { ··· 508 507 * RECOVER Creation of a hibernation image or restoration of the main 509 508 * memory contents from a hibernation image has failed, call 510 509 * ->thaw() and ->complete() for all devices. 510 + * POWEROFF System will poweroff, call ->poweroff() for all devices. 511 511 * 512 512 * The following PM_EVENT_ messages are defined for internal use by 513 513 * kernel subsystems. They are never issued by the PM core. ··· 539 537 #define PM_EVENT_USER 0x0100 540 538 #define PM_EVENT_REMOTE 0x0200 541 539 #define PM_EVENT_AUTO 0x0400 540 + #define PM_EVENT_POWEROFF 0x0800 542 541 543 542 #define PM_EVENT_SLEEP (PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE) 544 543 #define PM_EVENT_USER_SUSPEND (PM_EVENT_USER | PM_EVENT_SUSPEND) ··· 554 551 #define PMSG_QUIESCE ((struct pm_message){ .event = PM_EVENT_QUIESCE, }) 555 552 #define PMSG_SUSPEND ((struct pm_message){ .event = PM_EVENT_SUSPEND, }) 556 553 #define PMSG_HIBERNATE ((struct pm_message){ .event = PM_EVENT_HIBERNATE, }) 554 + #define PMSG_POWEROFF ((struct pm_message){ .event = PM_EVENT_POWEROFF, }) 557 555 #define PMSG_RESUME ((struct pm_message){ .event = PM_EVENT_RESUME, }) 558 556 #define PMSG_THAW ((struct pm_message){ .event = PM_EVENT_THAW, }) 559 557 #define PMSG_RESTORE ((struct pm_message){ .event = PM_EVENT_RESTORE, })

+1

include/linux/pm_domain.h

··· 153 153 }; 154 154 155 155 struct dev_power_governor { 156 + bool (*system_power_down_ok)(struct dev_pm_domain *domain); 156 157 bool (*power_down_ok)(struct dev_pm_domain *domain); 157 158 bool (*suspend_ok)(struct device *dev); 158 159 };

+9

include/linux/pm_qos.h

··· 162 162 static inline void cpu_latency_qos_remove_request(struct pm_qos_request *req) {} 163 163 #endif 164 164 165 + #ifdef CONFIG_PM_QOS_CPU_SYSTEM_WAKEUP 166 + s32 cpu_wakeup_latency_qos_limit(void); 167 + #else 168 + static inline s32 cpu_wakeup_latency_qos_limit(void) 169 + { 170 + return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT; 171 + } 172 + #endif 173 + 165 174 #ifdef CONFIG_PM 166 175 enum pm_qos_flags_status __dev_pm_qos_flags(struct device *dev, s32 mask); 167 176 enum pm_qos_flags_status dev_pm_qos_flags(struct device *dev, s32 mask);

+24

include/linux/pm_runtime.h

··· 637 637 DEFINE_GUARD_COND(pm_runtime_active_auto, _try_enabled, 638 638 pm_runtime_resume_and_get(_T), _RET == 0) 639 639 640 + /* ACQUIRE() wrapper macros for the guards defined above. */ 641 + 642 + #define PM_RUNTIME_ACQUIRE(_dev, _var) \ 643 + ACQUIRE(pm_runtime_active_try, _var)(_dev) 644 + 645 + #define PM_RUNTIME_ACQUIRE_AUTOSUSPEND(_dev, _var) \ 646 + ACQUIRE(pm_runtime_active_auto_try, _var)(_dev) 647 + 648 + #define PM_RUNTIME_ACQUIRE_IF_ENABLED(_dev, _var) \ 649 + ACQUIRE(pm_runtime_active_try_enabled, _var)(_dev) 650 + 651 + #define PM_RUNTIME_ACQUIRE_IF_ENABLED_AUTOSUSPEND(_dev, _var) \ 652 + ACQUIRE(pm_runtime_active_auto_try_enabled, _var)(_dev) 653 + 654 + /* 655 + * ACQUIRE_ERR() wrapper macro for guard pm_runtime_active. 656 + * 657 + * Always check PM_RUNTIME_ACQUIRE_ERR() after using one of the 658 + * PM_RUNTIME_ACQUIRE*() macros defined above (yes, it can be used with 659 + * any of them) and if it is nonzero, avoid accessing the given device. 660 + */ 661 + #define PM_RUNTIME_ACQUIRE_ERR(_var_ptr) \ 662 + ACQUIRE_ERR(pm_runtime_active, _var_ptr) 663 + 640 664 /** 641 665 * pm_runtime_put_sync - Drop device usage counter and run "idle check" if 0. 642 666 * @dev: Target device.

+2 -1

include/trace/events/power.h

··· 179 179 { PM_EVENT_HIBERNATE, "hibernate" }, \ 180 180 { PM_EVENT_THAW, "thaw" }, \ 181 181 { PM_EVENT_RESTORE, "restore" }, \ 182 - { PM_EVENT_RECOVER, "recover" }) 182 + { PM_EVENT_RECOVER, "recover" }, \ 183 + { PM_EVENT_POWEROFF, "poweroff" }) 183 184 184 185 DEFINE_EVENT(cpu, cpu_frequency, 185 186

+62

include/uapi/linux/energy_model.h

··· 1 + /* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */ 2 + /* Do not edit directly, auto-generated from: */ 3 + /* Documentation/netlink/specs/em.yaml */ 4 + /* YNL-GEN uapi header */ 5 + 6 + #ifndef _UAPI_LINUX_ENERGY_MODEL_H 7 + #define _UAPI_LINUX_ENERGY_MODEL_H 8 + 9 + #define EM_FAMILY_NAME "em" 10 + #define EM_FAMILY_VERSION 1 11 + 12 + enum { 13 + EM_A_PDS_PD = 1, 14 + 15 + __EM_A_PDS_MAX, 16 + EM_A_PDS_MAX = (__EM_A_PDS_MAX - 1) 17 + }; 18 + 19 + enum { 20 + EM_A_PD_PAD = 1, 21 + EM_A_PD_PD_ID, 22 + EM_A_PD_FLAGS, 23 + EM_A_PD_CPUS, 24 + 25 + __EM_A_PD_MAX, 26 + EM_A_PD_MAX = (__EM_A_PD_MAX - 1) 27 + }; 28 + 29 + enum { 30 + EM_A_PD_TABLE_PD_ID = 1, 31 + EM_A_PD_TABLE_PS, 32 + 33 + __EM_A_PD_TABLE_MAX, 34 + EM_A_PD_TABLE_MAX = (__EM_A_PD_TABLE_MAX - 1) 35 + }; 36 + 37 + enum { 38 + EM_A_PS_PAD = 1, 39 + EM_A_PS_PERFORMANCE, 40 + EM_A_PS_FREQUENCY, 41 + EM_A_PS_POWER, 42 + EM_A_PS_COST, 43 + EM_A_PS_FLAGS, 44 + 45 + __EM_A_PS_MAX, 46 + EM_A_PS_MAX = (__EM_A_PS_MAX - 1) 47 + }; 48 + 49 + enum { 50 + EM_CMD_GET_PDS = 1, 51 + EM_CMD_GET_PD_TABLE, 52 + EM_CMD_PD_CREATED, 53 + EM_CMD_PD_UPDATED, 54 + EM_CMD_PD_DELETED, 55 + 56 + __EM_CMD_MAX, 57 + EM_CMD_MAX = (__EM_CMD_MAX - 1) 58 + }; 59 + 60 + #define EM_MCGRP_EVENT "event" 61 + 62 + #endif /* _UAPI_LINUX_ENERGY_MODEL_H */

+1 -1

kernel/cgroup/legacy_freezer.c

··· 63 63 return css_freezer(freezer->css.parent); 64 64 } 65 65 66 - bool cgroup_freezing(struct task_struct *task) 66 + bool cgroup1_freezing(struct task_struct *task) 67 67 { 68 68 bool ret; 69 69

+1 -1

kernel/freezer.c

··· 44 44 if (tsk_is_oom_victim(p)) 45 45 return false; 46 46 47 - if (pm_nosig_freezing || cgroup_freezing(p)) 47 + if (pm_nosig_freezing || cgroup1_freezing(p)) 48 48 return true; 49 49 50 50 if (pm_freezing && !(p->flags & PF_KTHREAD))

+11

kernel/power/Kconfig

··· 202 202 depends on PM_WAKELOCKS 203 203 default y 204 204 205 + config PM_QOS_CPU_SYSTEM_WAKEUP 206 + bool "User space interface for CPU system wakeup QoS" 207 + depends on CPU_IDLE 208 + help 209 + Enable this to allow user space via the cpu_wakeup_latency file to 210 + specify a CPU system wakeup latency limit. 211 + 212 + This may be particularly useful for platforms supporting multiple low 213 + power states for CPUs during system-wide suspend and s2idle in 214 + particular. 215 + 205 216 config PM 206 217 bool "Device power management core functionality" 207 218 help

+3 -1

kernel/power/Makefile

··· 21 21 22 22 obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o 23 23 24 - obj-$(CONFIG_ENERGY_MODEL) += energy_model.o 24 + obj-$(CONFIG_ENERGY_MODEL) += em.o 25 + em-y := energy_model.o 26 + em-$(CONFIG_NET) += em_netlink_autogen.o em_netlink.o

+6 -2

kernel/power/console.c

··· 44 44 * no_console_suspend argument has been passed on the command line, VT 45 45 * switches will occur. 46 46 */ 47 - void pm_vt_switch_required(struct device *dev, bool required) 47 + int pm_vt_switch_required(struct device *dev, bool required) 48 48 { 49 49 struct pm_vt_switch *entry, *tmp; 50 + int ret = 0; 50 51 51 52 mutex_lock(&vt_switch_mutex); 52 53 list_for_each_entry(tmp, &pm_vt_switch_list, head) { ··· 59 58 } 60 59 61 60 entry = kmalloc(sizeof(*entry), GFP_KERNEL); 62 - if (!entry) 61 + if (!entry) { 62 + ret = -ENOMEM; 63 63 goto out; 64 + } 64 65 65 66 entry->required = required; 66 67 entry->dev = dev; ··· 70 67 list_add(&entry->head, &pm_vt_switch_list); 71 68 out: 72 69 mutex_unlock(&vt_switch_mutex); 70 + return ret; 73 71 } 74 72 EXPORT_SYMBOL(pm_vt_switch_required); 75 73

+308

kernel/power/em_netlink.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * 4 + * Generic netlink for energy model. 5 + * 6 + * Copyright (c) 2025 Valve Corporation. 7 + * Author: Changwoo Min <changwoo@igalia.com> 8 + */ 9 + 10 + #define pr_fmt(fmt) "energy_model: " fmt 11 + 12 + #include <linux/energy_model.h> 13 + #include <net/sock.h> 14 + #include <net/genetlink.h> 15 + #include <uapi/linux/energy_model.h> 16 + 17 + #include "em_netlink.h" 18 + #include "em_netlink_autogen.h" 19 + 20 + #define EM_A_PD_CPUS_LEN 256 21 + 22 + /*************************** Command encoding ********************************/ 23 + static int __em_nl_get_pd_size(struct em_perf_domain *pd, void *data) 24 + { 25 + char cpus_buf[EM_A_PD_CPUS_LEN]; 26 + int *tot_msg_sz = data; 27 + int msg_sz, cpus_sz; 28 + 29 + cpus_sz = snprintf(cpus_buf, sizeof(cpus_buf), "%*pb", 30 + cpumask_pr_args(to_cpumask(pd->cpus))); 31 + 32 + msg_sz = nla_total_size(0) + /* EM_A_PDS_PD */ 33 + nla_total_size(sizeof(u32)) + /* EM_A_PD_PD_ID */ 34 + nla_total_size_64bit(sizeof(u64)) + /* EM_A_PD_FLAGS */ 35 + nla_total_size(cpus_sz); /* EM_A_PD_CPUS */ 36 + 37 + *tot_msg_sz += nlmsg_total_size(genlmsg_msg_size(msg_sz)); 38 + return 0; 39 + } 40 + 41 + static int __em_nl_get_pd(struct em_perf_domain *pd, void *data) 42 + { 43 + char cpus_buf[EM_A_PD_CPUS_LEN]; 44 + struct sk_buff *msg = data; 45 + struct nlattr *entry; 46 + 47 + entry = nla_nest_start(msg, EM_A_PDS_PD); 48 + if (!entry) 49 + goto out_cancel_nest; 50 + 51 + if (nla_put_u32(msg, EM_A_PD_PD_ID, pd->id)) 52 + goto out_cancel_nest; 53 + 54 + if (nla_put_u64_64bit(msg, EM_A_PD_FLAGS, pd->flags, EM_A_PD_PAD)) 55 + goto out_cancel_nest; 56 + 57 + snprintf(cpus_buf, sizeof(cpus_buf), "%*pb", 58 + cpumask_pr_args(to_cpumask(pd->cpus))); 59 + if (nla_put_string(msg, EM_A_PD_CPUS, cpus_buf)) 60 + goto out_cancel_nest; 61 + 62 + nla_nest_end(msg, entry); 63 + 64 + return 0; 65 + 66 + out_cancel_nest: 67 + nla_nest_cancel(msg, entry); 68 + 69 + return -EMSGSIZE; 70 + } 71 + 72 + int em_nl_get_pds_doit(struct sk_buff *skb, struct genl_info *info) 73 + { 74 + struct sk_buff *msg; 75 + void *hdr; 76 + int cmd = info->genlhdr->cmd; 77 + int ret = -EMSGSIZE, msg_sz = 0; 78 + 79 + for_each_em_perf_domain(__em_nl_get_pd_size, &msg_sz); 80 + 81 + msg = genlmsg_new(msg_sz, GFP_KERNEL); 82 + if (!msg) 83 + return -ENOMEM; 84 + 85 + hdr = genlmsg_put_reply(msg, info, &em_nl_family, 0, cmd); 86 + if (!hdr) 87 + goto out_free_msg; 88 + 89 + ret = for_each_em_perf_domain(__em_nl_get_pd, msg); 90 + if (ret) 91 + goto out_cancel_msg; 92 + 93 + genlmsg_end(msg, hdr); 94 + 95 + return genlmsg_reply(msg, info); 96 + 97 + out_cancel_msg: 98 + genlmsg_cancel(msg, hdr); 99 + out_free_msg: 100 + nlmsg_free(msg); 101 + 102 + return ret; 103 + } 104 + 105 + static struct em_perf_domain *__em_nl_get_pd_table_id(struct nlattr **attrs) 106 + { 107 + struct em_perf_domain *pd; 108 + int id; 109 + 110 + if (!attrs[EM_A_PD_TABLE_PD_ID]) 111 + return NULL; 112 + 113 + id = nla_get_u32(attrs[EM_A_PD_TABLE_PD_ID]); 114 + pd = em_perf_domain_get_by_id(id); 115 + return pd; 116 + } 117 + 118 + static int __em_nl_get_pd_table_size(const struct em_perf_domain *pd) 119 + { 120 + int id_sz, ps_sz; 121 + 122 + id_sz = nla_total_size(sizeof(u32)); /* EM_A_PD_TABLE_PD_ID */ 123 + ps_sz = nla_total_size(0) + /* EM_A_PD_TABLE_PS */ 124 + nla_total_size_64bit(sizeof(u64)) + /* EM_A_PS_PERFORMANCE */ 125 + nla_total_size_64bit(sizeof(u64)) + /* EM_A_PS_FREQUENCY */ 126 + nla_total_size_64bit(sizeof(u64)) + /* EM_A_PS_POWER */ 127 + nla_total_size_64bit(sizeof(u64)) + /* EM_A_PS_COST */ 128 + nla_total_size_64bit(sizeof(u64)); /* EM_A_PS_FLAGS */ 129 + ps_sz *= pd->nr_perf_states; 130 + 131 + return nlmsg_total_size(genlmsg_msg_size(id_sz + ps_sz)); 132 + } 133 + 134 + static int __em_nl_get_pd_table(struct sk_buff *msg, const struct em_perf_domain *pd) 135 + { 136 + struct em_perf_state *table, *ps; 137 + struct nlattr *entry; 138 + int i; 139 + 140 + if (nla_put_u32(msg, EM_A_PD_TABLE_PD_ID, pd->id)) 141 + goto out_err; 142 + 143 + rcu_read_lock(); 144 + table = em_perf_state_from_pd((struct em_perf_domain *)pd); 145 + 146 + for (i = 0; i < pd->nr_perf_states; i++) { 147 + ps = &table[i]; 148 + 149 + entry = nla_nest_start(msg, EM_A_PD_TABLE_PS); 150 + if (!entry) 151 + goto out_unlock_ps; 152 + 153 + if (nla_put_u64_64bit(msg, EM_A_PS_PERFORMANCE, 154 + ps->performance, EM_A_PS_PAD)) 155 + goto out_cancel_ps_nest; 156 + if (nla_put_u64_64bit(msg, EM_A_PS_FREQUENCY, 157 + ps->frequency, EM_A_PS_PAD)) 158 + goto out_cancel_ps_nest; 159 + if (nla_put_u64_64bit(msg, EM_A_PS_POWER, 160 + ps->power, EM_A_PS_PAD)) 161 + goto out_cancel_ps_nest; 162 + if (nla_put_u64_64bit(msg, EM_A_PS_COST, 163 + ps->cost, EM_A_PS_PAD)) 164 + goto out_cancel_ps_nest; 165 + if (nla_put_u64_64bit(msg, EM_A_PS_FLAGS, 166 + ps->flags, EM_A_PS_PAD)) 167 + goto out_cancel_ps_nest; 168 + 169 + nla_nest_end(msg, entry); 170 + } 171 + rcu_read_unlock(); 172 + return 0; 173 + 174 + out_cancel_ps_nest: 175 + nla_nest_cancel(msg, entry); 176 + out_unlock_ps: 177 + rcu_read_unlock(); 178 + out_err: 179 + return -EMSGSIZE; 180 + } 181 + 182 + int em_nl_get_pd_table_doit(struct sk_buff *skb, struct genl_info *info) 183 + { 184 + int cmd = info->genlhdr->cmd; 185 + int msg_sz, ret = -EMSGSIZE; 186 + struct em_perf_domain *pd; 187 + struct sk_buff *msg; 188 + void *hdr; 189 + 190 + pd = __em_nl_get_pd_table_id(info->attrs); 191 + if (!pd) 192 + return -EINVAL; 193 + 194 + msg_sz = __em_nl_get_pd_table_size(pd); 195 + 196 + msg = genlmsg_new(msg_sz, GFP_KERNEL); 197 + if (!msg) 198 + return -ENOMEM; 199 + 200 + hdr = genlmsg_put_reply(msg, info, &em_nl_family, 0, cmd); 201 + if (!hdr) 202 + goto out_free_msg; 203 + 204 + ret = __em_nl_get_pd_table(msg, pd); 205 + if (ret) 206 + goto out_free_msg; 207 + 208 + genlmsg_end(msg, hdr); 209 + return genlmsg_reply(msg, info); 210 + 211 + out_free_msg: 212 + nlmsg_free(msg); 213 + return ret; 214 + } 215 + 216 + 217 + /**************************** Event encoding *********************************/ 218 + static void __em_notify_pd_table(const struct em_perf_domain *pd, int ntf_type) 219 + { 220 + struct sk_buff *msg; 221 + int msg_sz, ret = -EMSGSIZE; 222 + void *hdr; 223 + 224 + if (!genl_has_listeners(&em_nl_family, &init_net, EM_NLGRP_EVENT)) 225 + return; 226 + 227 + msg_sz = __em_nl_get_pd_table_size(pd); 228 + 229 + msg = genlmsg_new(msg_sz, GFP_KERNEL); 230 + if (!msg) 231 + return; 232 + 233 + hdr = genlmsg_put(msg, 0, 0, &em_nl_family, 0, ntf_type); 234 + if (!hdr) 235 + goto out_free_msg; 236 + 237 + ret = __em_nl_get_pd_table(msg, pd); 238 + if (ret) 239 + goto out_free_msg; 240 + 241 + genlmsg_end(msg, hdr); 242 + 243 + genlmsg_multicast(&em_nl_family, msg, 0, EM_NLGRP_EVENT, GFP_KERNEL); 244 + 245 + return; 246 + 247 + out_free_msg: 248 + nlmsg_free(msg); 249 + return; 250 + } 251 + 252 + void em_notify_pd_created(const struct em_perf_domain *pd) 253 + { 254 + __em_notify_pd_table(pd, EM_CMD_PD_CREATED); 255 + } 256 + 257 + void em_notify_pd_updated(const struct em_perf_domain *pd) 258 + { 259 + __em_notify_pd_table(pd, EM_CMD_PD_UPDATED); 260 + } 261 + 262 + static int __em_notify_pd_deleted_size(const struct em_perf_domain *pd) 263 + { 264 + int id_sz = nla_total_size(sizeof(u32)); /* EM_A_PD_TABLE_PD_ID */ 265 + 266 + return nlmsg_total_size(genlmsg_msg_size(id_sz)); 267 + } 268 + 269 + void em_notify_pd_deleted(const struct em_perf_domain *pd) 270 + { 271 + struct sk_buff *msg; 272 + void *hdr; 273 + int msg_sz; 274 + 275 + if (!genl_has_listeners(&em_nl_family, &init_net, EM_NLGRP_EVENT)) 276 + return; 277 + 278 + msg_sz = __em_notify_pd_deleted_size(pd); 279 + 280 + msg = genlmsg_new(msg_sz, GFP_KERNEL); 281 + if (!msg) 282 + return; 283 + 284 + hdr = genlmsg_put(msg, 0, 0, &em_nl_family, 0, EM_CMD_PD_DELETED); 285 + if (!hdr) 286 + goto out_free_msg; 287 + 288 + if (nla_put_u32(msg, EM_A_PD_TABLE_PD_ID, pd->id)) { 289 + goto out_free_msg; 290 + } 291 + 292 + genlmsg_end(msg, hdr); 293 + 294 + genlmsg_multicast(&em_nl_family, msg, 0, EM_NLGRP_EVENT, GFP_KERNEL); 295 + 296 + return; 297 + 298 + out_free_msg: 299 + nlmsg_free(msg); 300 + return; 301 + } 302 + 303 + /**************************** Initialization *********************************/ 304 + static int __init em_netlink_init(void) 305 + { 306 + return genl_register_family(&em_nl_family); 307 + } 308 + postcore_initcall(em_netlink_init);

+39

kernel/power/em_netlink.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * 4 + * Generic netlink for energy model. 5 + * 6 + * Copyright (c) 2025 Valve Corporation. 7 + * Author: Changwoo Min <changwoo@igalia.com> 8 + */ 9 + #ifndef _EM_NETLINK_H 10 + #define _EM_NETLINK_H 11 + 12 + #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_NET) 13 + int for_each_em_perf_domain(int (*cb)(struct em_perf_domain*, void *), 14 + void *data); 15 + struct em_perf_domain *em_perf_domain_get_by_id(int id); 16 + void em_notify_pd_created(const struct em_perf_domain *pd); 17 + void em_notify_pd_deleted(const struct em_perf_domain *pd); 18 + void em_notify_pd_updated(const struct em_perf_domain *pd); 19 + #else 20 + static inline 21 + int for_each_em_perf_domain(int (*cb)(struct em_perf_domain*, void *), 22 + void *data) 23 + { 24 + return -EINVAL; 25 + } 26 + static inline 27 + struct em_perf_domain *em_perf_domain_get_by_id(int id) 28 + { 29 + return NULL; 30 + } 31 + 32 + static inline void em_notify_pd_created(const struct em_perf_domain *pd) {} 33 + 34 + static inline void em_notify_pd_deleted(const struct em_perf_domain *pd) {} 35 + 36 + static inline void em_notify_pd_updated(const struct em_perf_domain *pd) {} 37 + #endif 38 + 39 + #endif /* _EM_NETLINK_H */

+48

kernel/power/em_netlink_autogen.c

··· 1 + // SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 2 + /* Do not edit directly, auto-generated from: */ 3 + /* Documentation/netlink/specs/em.yaml */ 4 + /* YNL-GEN kernel source */ 5 + 6 + #include <net/netlink.h> 7 + #include <net/genetlink.h> 8 + 9 + #include "em_netlink_autogen.h" 10 + 11 + #include <uapi/linux/energy_model.h> 12 + 13 + /* EM_CMD_GET_PD_TABLE - do */ 14 + static const struct nla_policy em_get_pd_table_nl_policy[EM_A_PD_TABLE_PD_ID + 1] = { 15 + [EM_A_PD_TABLE_PD_ID] = { .type = NLA_U32, }, 16 + }; 17 + 18 + /* Ops table for em */ 19 + static const struct genl_split_ops em_nl_ops[] = { 20 + { 21 + .cmd = EM_CMD_GET_PDS, 22 + .doit = em_nl_get_pds_doit, 23 + .flags = GENL_CMD_CAP_DO, 24 + }, 25 + { 26 + .cmd = EM_CMD_GET_PD_TABLE, 27 + .doit = em_nl_get_pd_table_doit, 28 + .policy = em_get_pd_table_nl_policy, 29 + .maxattr = EM_A_PD_TABLE_PD_ID, 30 + .flags = GENL_CMD_CAP_DO, 31 + }, 32 + }; 33 + 34 + static const struct genl_multicast_group em_nl_mcgrps[] = { 35 + [EM_NLGRP_EVENT] = { "event", }, 36 + }; 37 + 38 + struct genl_family em_nl_family __ro_after_init = { 39 + .name = EM_FAMILY_NAME, 40 + .version = EM_FAMILY_VERSION, 41 + .netnsok = true, 42 + .parallel_ops = true, 43 + .module = THIS_MODULE, 44 + .split_ops = em_nl_ops, 45 + .n_split_ops = ARRAY_SIZE(em_nl_ops), 46 + .mcgrps = em_nl_mcgrps, 47 + .n_mcgrps = ARRAY_SIZE(em_nl_mcgrps), 48 + };

+23

kernel/power/em_netlink_autogen.h

··· 1 + /* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */ 2 + /* Do not edit directly, auto-generated from: */ 3 + /* Documentation/netlink/specs/em.yaml */ 4 + /* YNL-GEN kernel header */ 5 + 6 + #ifndef _LINUX_EM_GEN_H 7 + #define _LINUX_EM_GEN_H 8 + 9 + #include <net/netlink.h> 10 + #include <net/genetlink.h> 11 + 12 + #include <uapi/linux/energy_model.h> 13 + 14 + int em_nl_get_pds_doit(struct sk_buff *skb, struct genl_info *info); 15 + int em_nl_get_pd_table_doit(struct sk_buff *skb, struct genl_info *info); 16 + 17 + enum { 18 + EM_NLGRP_EVENT, 19 + }; 20 + 21 + extern struct genl_family em_nl_family; 22 + 23 + #endif /* _LINUX_EM_GEN_H */

+88 -2

kernel/power/energy_model.c

··· 17 17 #include <linux/sched/topology.h> 18 18 #include <linux/slab.h> 19 19 20 + #include "em_netlink.h" 21 + 20 22 /* 21 23 * Mutex serializing the registrations of performance domains and letting 22 24 * callbacks defined by drivers sleep. 23 25 */ 24 26 static DEFINE_MUTEX(em_pd_mutex); 27 + 28 + /* 29 + * Manage performance domains with IDs. One can iterate the performance domains 30 + * through the list and pick one with their associated ID. The mutex serializes 31 + * the list access. When holding em_pd_list_mutex, em_pd_mutex should not be 32 + * taken to avoid potential deadlock. 33 + */ 34 + static DEFINE_IDA(em_pd_ida); 35 + static LIST_HEAD(em_pd_list); 36 + static DEFINE_MUTEX(em_pd_list_mutex); 25 37 26 38 static void em_cpufreq_update_efficiencies(struct device *dev, 27 39 struct em_perf_state *table); ··· 128 116 } 129 117 DEFINE_SHOW_ATTRIBUTE(em_debug_flags); 130 118 119 + static int em_debug_id_show(struct seq_file *s, void *unused) 120 + { 121 + struct em_perf_domain *pd = s->private; 122 + 123 + seq_printf(s, "%d\n", pd->id); 124 + 125 + return 0; 126 + } 127 + DEFINE_SHOW_ATTRIBUTE(em_debug_id); 128 + 131 129 static void em_debug_create_pd(struct device *dev) 132 130 { 133 131 struct em_dbg_info *em_dbg; ··· 153 131 154 132 debugfs_create_file("flags", 0444, d, dev->em_pd, 155 133 &em_debug_flags_fops); 134 + 135 + debugfs_create_file("id", 0444, d, dev->em_pd, &em_debug_id_fops); 156 136 157 137 em_dbg = devm_kcalloc(dev, dev->em_pd->nr_perf_states, 158 138 sizeof(*em_dbg), GFP_KERNEL); ··· 352 328 em_table_free(old_table); 353 329 354 330 mutex_unlock(&em_pd_mutex); 331 + 332 + em_notify_pd_updated(pd); 355 333 return 0; 356 334 } 357 335 EXPORT_SYMBOL_GPL(em_dev_update_perf_domain); ··· 422 396 struct em_perf_table *em_table; 423 397 struct em_perf_domain *pd; 424 398 struct device *cpu_dev; 425 - int cpu, ret, num_cpus; 399 + int cpu, ret, num_cpus, id; 426 400 427 401 if (_is_cpu_device(dev)) { 428 402 num_cpus = cpumask_weight(cpus); ··· 445 419 } 446 420 447 421 pd->nr_perf_states = nr_states; 422 + 423 + INIT_LIST_HEAD(&pd->node); 424 + 425 + id = ida_alloc(&em_pd_ida, GFP_KERNEL); 426 + if (id < 0) 427 + return -ENOMEM; 428 + pd->id = id; 448 429 449 430 em_table = em_table_alloc(pd); 450 431 if (!em_table) ··· 477 444 kfree(em_table); 478 445 free_pd: 479 446 kfree(pd); 447 + ida_free(&em_pd_ida, id); 480 448 return -EINVAL; 481 449 } 482 450 ··· 693 659 694 660 unlock: 695 661 mutex_unlock(&em_pd_mutex); 662 + if (ret) 663 + return ret; 696 664 697 - return ret; 665 + mutex_lock(&em_pd_list_mutex); 666 + list_add_tail(&dev->em_pd->node, &em_pd_list); 667 + mutex_unlock(&em_pd_list_mutex); 668 + 669 + em_notify_pd_created(dev->em_pd); 670 + 671 + return 0; 698 672 } 699 673 EXPORT_SYMBOL_GPL(em_dev_register_pd_no_update); 700 674 ··· 720 678 if (_is_cpu_device(dev)) 721 679 return; 722 680 681 + mutex_lock(&em_pd_list_mutex); 682 + list_del_init(&dev->em_pd->node); 683 + mutex_unlock(&em_pd_list_mutex); 684 + 685 + em_notify_pd_deleted(dev->em_pd); 686 + 723 687 /* 724 688 * The mutex separates all register/unregister requests and protects 725 689 * from potential clean-up/setup issues in the debugfs directories. ··· 736 688 737 689 em_table_free(rcu_dereference_protected(dev->em_pd->em_table, 738 690 lockdep_is_held(&em_pd_mutex))); 691 + 692 + ida_free(&em_pd_ida, dev->em_pd->id); 739 693 740 694 kfree(dev->em_pd); 741 695 dev->em_pd = NULL; ··· 1008 958 */ 1009 959 schedule_work(&rebuild_sd_work); 1010 960 } 961 + 962 + #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_NET) 963 + int for_each_em_perf_domain(int (*cb)(struct em_perf_domain*, void *), 964 + void *data) 965 + { 966 + struct em_perf_domain *pd; 967 + 968 + lockdep_assert_not_held(&em_pd_mutex); 969 + guard(mutex)(&em_pd_list_mutex); 970 + 971 + list_for_each_entry(pd, &em_pd_list, node) { 972 + int ret; 973 + 974 + ret = cb(pd, data); 975 + if (ret) 976 + return ret; 977 + } 978 + 979 + return 0; 980 + } 981 + 982 + struct em_perf_domain *em_perf_domain_get_by_id(int id) 983 + { 984 + struct em_perf_domain *pd; 985 + 986 + lockdep_assert_not_held(&em_pd_mutex); 987 + guard(mutex)(&em_pd_list_mutex); 988 + 989 + list_for_each_entry(pd, &em_pd_list, node) { 990 + if (pd->id == id) 991 + return pd; 992 + } 993 + 994 + return NULL; 995 + } 996 + #endif

+5 -1

kernel/power/hibernate.c

··· 820 820 if (error) 821 821 goto Restore; 822 822 823 - ksys_sync_helper(); 823 + error = pm_sleep_fs_sync(); 824 + if (error) 825 + goto Notify; 826 + 824 827 filesystems_freeze(filesystem_freeze_enabled); 825 828 826 829 error = freeze_processes(); ··· 894 891 freezer_test_done = false; 895 892 Exit: 896 893 filesystems_thaw(); 894 + Notify: 897 895 pm_notifier_call_chain(PM_POST_HIBERNATION); 898 896 Restore: 899 897 pm_restore_console();

+74 -7

kernel/power/main.c

··· 18 18 #include <linux/suspend.h> 19 19 #include <linux/syscalls.h> 20 20 #include <linux/pm_runtime.h> 21 + #include <linux/atomic.h> 22 + #include <linux/wait.h> 21 23 22 24 #include "power.h" 23 25 ··· 93 91 elapsed_msecs / MSEC_PER_SEC, elapsed_msecs % MSEC_PER_SEC); 94 92 } 95 93 EXPORT_SYMBOL_GPL(ksys_sync_helper); 94 + 95 + #if defined(CONFIG_SUSPEND) || defined(CONFIG_HIBERNATION) 96 + /* Wakeup events handling resolution while syncing file systems in jiffies */ 97 + #define PM_FS_SYNC_WAKEUP_RESOLUTION 5 98 + 99 + static atomic_t pm_fs_sync_count = ATOMIC_INIT(0); 100 + static struct workqueue_struct *pm_fs_sync_wq; 101 + static DECLARE_WAIT_QUEUE_HEAD(pm_fs_sync_wait); 102 + 103 + static bool pm_fs_sync_completed(void) 104 + { 105 + return atomic_read(&pm_fs_sync_count) == 0; 106 + } 107 + 108 + static void pm_fs_sync_work_fn(struct work_struct *work) 109 + { 110 + ksys_sync_helper(); 111 + 112 + if (atomic_dec_and_test(&pm_fs_sync_count)) 113 + wake_up(&pm_fs_sync_wait); 114 + } 115 + static DECLARE_WORK(pm_fs_sync_work, pm_fs_sync_work_fn); 116 + 117 + /** 118 + * pm_sleep_fs_sync() - Sync file systems in an interruptible way 119 + * 120 + * Return: 0 on successful file system sync, or -EBUSY if the file system sync 121 + * was aborted. 122 + */ 123 + int pm_sleep_fs_sync(void) 124 + { 125 + pm_wakeup_clear(0); 126 + 127 + /* 128 + * Take back-to-back sleeps into account by queuing a subsequent fs sync 129 + * only if the previous fs sync is running or is not queued. Multiple fs 130 + * syncs increase the likelihood of saving the latest files immediately 131 + * before sleep. 132 + */ 133 + if (!work_pending(&pm_fs_sync_work)) { 134 + atomic_inc(&pm_fs_sync_count); 135 + queue_work(pm_fs_sync_wq, &pm_fs_sync_work); 136 + } 137 + 138 + while (!pm_fs_sync_completed()) { 139 + if (pm_wakeup_pending()) 140 + return -EBUSY; 141 + 142 + wait_event_timeout(pm_fs_sync_wait, pm_fs_sync_completed(), 143 + PM_FS_SYNC_WAKEUP_RESOLUTION); 144 + } 145 + 146 + return 0; 147 + } 148 + #endif /* CONFIG_SUSPEND || CONFIG_HIBERNATION */ 96 149 97 150 /* Routines for PM-transition notifications */ 98 151 ··· 288 231 power_attr(mem_sleep); 289 232 290 233 /* 291 - * sync_on_suspend: invoke ksys_sync_helper() before suspend. 234 + * sync_on_suspend: Sync file systems before suspend. 292 235 * 293 - * show() returns whether ksys_sync_helper() is invoked before suspend. 294 - * store() accepts 0 or 1. 0 disables ksys_sync_helper() and 1 enables it. 236 + * show() returns whether file systems sync before suspend is enabled. 237 + * store() accepts 0 or 1. 0 disables file systems sync and 1 enables it. 295 238 */ 296 239 bool sync_on_suspend_enabled = !IS_ENABLED(CONFIG_SUSPEND_SKIP_SYNC); 297 240 ··· 1123 1066 struct workqueue_struct *pm_wq; 1124 1067 EXPORT_SYMBOL_GPL(pm_wq); 1125 1068 1126 - static int __init pm_start_workqueue(void) 1069 + static int __init pm_start_workqueues(void) 1127 1070 { 1128 - pm_wq = alloc_workqueue("pm", WQ_FREEZABLE, 0); 1071 + pm_wq = alloc_workqueue("pm", WQ_FREEZABLE | WQ_UNBOUND, 0); 1072 + if (!pm_wq) 1073 + return -ENOMEM; 1129 1074 1130 - return pm_wq ? 0 : -ENOMEM; 1075 + #if defined(CONFIG_SUSPEND) || defined(CONFIG_HIBERNATION) 1076 + pm_fs_sync_wq = alloc_ordered_workqueue("pm_fs_sync", 0); 1077 + if (!pm_fs_sync_wq) { 1078 + destroy_workqueue(pm_wq); 1079 + return -ENOMEM; 1080 + } 1081 + #endif 1082 + 1083 + return 0; 1131 1084 } 1132 1085 1133 1086 static int __init pm_init(void) 1134 1087 { 1135 - int error = pm_start_workqueue(); 1088 + int error = pm_start_workqueues(); 1136 1089 if (error) 1137 1090 return error; 1138 1091 hibernate_image_size_init();

+1

kernel/power/power.h

··· 19 19 } __aligned(PAGE_SIZE); 20 20 21 21 #if defined(CONFIG_SUSPEND) || defined(CONFIG_HIBERNATION) 22 + extern int pm_sleep_fs_sync(void); 22 23 extern bool filesystem_freeze_enabled; 23 24 #endif 24 25

+106

kernel/power/qos.c

··· 415 415 .fops = &cpu_latency_qos_fops, 416 416 }; 417 417 418 + #ifdef CONFIG_PM_QOS_CPU_SYSTEM_WAKEUP 419 + /* The CPU system wakeup latency QoS. */ 420 + static struct pm_qos_constraints cpu_wakeup_latency_constraints = { 421 + .list = PLIST_HEAD_INIT(cpu_wakeup_latency_constraints.list), 422 + .target_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT, 423 + .default_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT, 424 + .no_constraint_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT, 425 + .type = PM_QOS_MIN, 426 + }; 427 + 428 + /** 429 + * cpu_wakeup_latency_qos_limit - Current CPU system wakeup latency QoS limit. 430 + * 431 + * Returns the current CPU system wakeup latency QoS limit that may have been 432 + * requested by user space. 433 + */ 434 + s32 cpu_wakeup_latency_qos_limit(void) 435 + { 436 + return pm_qos_read_value(&cpu_wakeup_latency_constraints); 437 + } 438 + 439 + static int cpu_wakeup_latency_qos_open(struct inode *inode, struct file *filp) 440 + { 441 + struct pm_qos_request *req; 442 + 443 + req = kzalloc(sizeof(*req), GFP_KERNEL); 444 + if (!req) 445 + return -ENOMEM; 446 + 447 + req->qos = &cpu_wakeup_latency_constraints; 448 + pm_qos_update_target(req->qos, &req->node, PM_QOS_ADD_REQ, 449 + PM_QOS_RESUME_LATENCY_NO_CONSTRAINT); 450 + filp->private_data = req; 451 + 452 + return 0; 453 + } 454 + 455 + static int cpu_wakeup_latency_qos_release(struct inode *inode, 456 + struct file *filp) 457 + { 458 + struct pm_qos_request *req = filp->private_data; 459 + 460 + filp->private_data = NULL; 461 + pm_qos_update_target(req->qos, &req->node, PM_QOS_REMOVE_REQ, 462 + PM_QOS_RESUME_LATENCY_NO_CONSTRAINT); 463 + kfree(req); 464 + 465 + return 0; 466 + } 467 + 468 + static ssize_t cpu_wakeup_latency_qos_read(struct file *filp, char __user *buf, 469 + size_t count, loff_t *f_pos) 470 + { 471 + s32 value = pm_qos_read_value(&cpu_wakeup_latency_constraints); 472 + 473 + return simple_read_from_buffer(buf, count, f_pos, &value, sizeof(s32)); 474 + } 475 + 476 + static ssize_t cpu_wakeup_latency_qos_write(struct file *filp, 477 + const char __user *buf, 478 + size_t count, loff_t *f_pos) 479 + { 480 + struct pm_qos_request *req = filp->private_data; 481 + s32 value; 482 + 483 + if (count == sizeof(s32)) { 484 + if (copy_from_user(&value, buf, sizeof(s32))) 485 + return -EFAULT; 486 + } else { 487 + int ret; 488 + 489 + ret = kstrtos32_from_user(buf, count, 16, &value); 490 + if (ret) 491 + return ret; 492 + } 493 + 494 + if (value < 0) 495 + return -EINVAL; 496 + 497 + pm_qos_update_target(req->qos, &req->node, PM_QOS_UPDATE_REQ, value); 498 + 499 + return count; 500 + } 501 + 502 + static const struct file_operations cpu_wakeup_latency_qos_fops = { 503 + .open = cpu_wakeup_latency_qos_open, 504 + .release = cpu_wakeup_latency_qos_release, 505 + .read = cpu_wakeup_latency_qos_read, 506 + .write = cpu_wakeup_latency_qos_write, 507 + .llseek = noop_llseek, 508 + }; 509 + 510 + static struct miscdevice cpu_wakeup_latency_qos_miscdev = { 511 + .minor = MISC_DYNAMIC_MINOR, 512 + .name = "cpu_wakeup_latency", 513 + .fops = &cpu_wakeup_latency_qos_fops, 514 + }; 515 + #endif /* CONFIG_PM_QOS_CPU_SYSTEM_WAKEUP */ 516 + 418 517 static int __init cpu_latency_qos_init(void) 419 518 { 420 519 int ret; ··· 522 423 if (ret < 0) 523 424 pr_err("%s: %s setup failed\n", __func__, 524 425 cpu_latency_qos_miscdev.name); 426 + 427 + #ifdef CONFIG_PM_QOS_CPU_SYSTEM_WAKEUP 428 + ret = misc_register(&cpu_wakeup_latency_qos_miscdev); 429 + if (ret < 0) 430 + pr_err("%s: %s setup failed\n", __func__, 431 + cpu_wakeup_latency_qos_miscdev.name); 432 + #endif 525 433 526 434 return ret; 527 435 }

+6 -7

kernel/power/snapshot.c

··· 2110 2110 { 2111 2111 unsigned int nr_pages, nr_highmem; 2112 2112 2113 - pr_info("Creating image:\n"); 2113 + pm_deferred_pr_dbg("Creating image\n"); 2114 2114 2115 2115 drain_local_pages(NULL); 2116 2116 nr_pages = count_data_pages(); 2117 2117 nr_highmem = count_highmem_pages(); 2118 - pr_info("Need to copy %u pages\n", nr_pages + nr_highmem); 2118 + pm_deferred_pr_dbg("Need to copy %u pages\n", nr_pages + nr_highmem); 2119 2119 2120 2120 if (!enough_free_mem(nr_pages, nr_highmem)) { 2121 - pr_err("Not enough free memory\n"); 2121 + pm_deferred_pr_dbg("Not enough free memory for image creation\n"); 2122 2122 return -ENOMEM; 2123 2123 } 2124 2124 2125 - if (swsusp_alloc(&copy_bm, nr_pages, nr_highmem)) { 2126 - pr_err("Memory allocation failed\n"); 2125 + if (swsusp_alloc(&copy_bm, nr_pages, nr_highmem)) 2127 2126 return -ENOMEM; 2128 - } 2129 2127 2130 2128 /* 2131 2129 * During allocating of suspend pagedir, new cold pages may appear. ··· 2142 2144 nr_zero_pages = nr_pages - nr_copy_pages; 2143 2145 nr_meta_pages = DIV_ROUND_UP(nr_pages * sizeof(long), PAGE_SIZE); 2144 2146 2145 - pr_info("Image created (%d pages copied, %d zero pages)\n", nr_copy_pages, nr_zero_pages); 2147 + pm_deferred_pr_dbg("Image created (%d pages copied, %d zero pages)\n", 2148 + nr_copy_pages, nr_zero_pages); 2146 2149 2147 2150 return 0; 2148 2151 }

+10 -2

kernel/power/suspend.c

··· 344 344 static int suspend_test(int level) 345 345 { 346 346 #ifdef CONFIG_PM_DEBUG 347 + int i; 348 + 347 349 if (pm_test_level == level) { 348 350 pr_info("suspend debug: Waiting for %d second(s).\n", 349 351 pm_test_delay); 350 - mdelay(pm_test_delay * 1000); 352 + for (i = 0; i < pm_test_delay && !pm_wakeup_pending(); i++) 353 + msleep(1000); 354 + 351 355 return 1; 352 356 } 353 357 #endif /* !CONFIG_PM_DEBUG */ ··· 593 589 594 590 if (sync_on_suspend_enabled) { 595 591 trace_suspend_resume(TPS("sync_filesystems"), 0, true); 596 - ksys_sync_helper(); 592 + 593 + error = pm_sleep_fs_sync(); 594 + if (error) 595 + goto Unlock; 596 + 597 597 trace_suspend_resume(TPS("sync_filesystems"), 0, false); 598 598 } 599 599

+146 -110

kernel/power/swap.c

··· 46 46 static bool clean_pages_on_decompress; 47 47 48 48 /* 49 - * The swap map is a data structure used for keeping track of each page 50 - * written to a swap partition. It consists of many swap_map_page 51 - * structures that contain each an array of MAP_PAGE_ENTRIES swap entries. 52 - * These structures are stored on the swap and linked together with the 53 - * help of the .next_swap member. 49 + * The swap map is a data structure used for keeping track of each page 50 + * written to a swap partition. It consists of many swap_map_page structures 51 + * that contain each an array of MAP_PAGE_ENTRIES swap entries. These 52 + * structures are stored on the swap and linked together with the help of the 53 + * .next_swap member. 54 54 * 55 - * The swap map is created during suspend. The swap map pages are 56 - * allocated and populated one at a time, so we only need one memory 57 - * page to set up the entire structure. 55 + * The swap map is created during suspend. The swap map pages are allocated and 56 + * populated one at a time, so we only need one memory page to set up the entire 57 + * structure. 58 58 * 59 - * During resume we pick up all swap_map_page structures into a list. 59 + * During resume we pick up all swap_map_page structures into a list. 60 60 */ 61 - 62 61 #define MAP_PAGE_ENTRIES (PAGE_SIZE / sizeof(sector_t) - 1) 63 62 64 63 /* ··· 88 89 }; 89 90 90 91 /* 91 - * The swap_map_handle structure is used for handling swap in 92 - * a file-alike way 92 + * The swap_map_handle structure is used for handling swap in a file-alike way. 93 93 */ 94 - 95 94 struct swap_map_handle { 96 95 struct swap_map_page *cur; 97 96 struct swap_map_page_list *maps; ··· 114 117 static struct swsusp_header *swsusp_header; 115 118 116 119 /* 117 - * The following functions are used for tracing the allocated 118 - * swap pages, so that they can be freed in case of an error. 120 + * The following functions are used for tracing the allocated swap pages, so 121 + * that they can be freed in case of an error. 119 122 */ 120 - 121 123 struct swsusp_extent { 122 124 struct rb_node node; 123 125 unsigned long start; ··· 166 170 return 0; 167 171 } 168 172 169 - /* 170 - * alloc_swapdev_block - allocate a swap page and register that it has 171 - * been allocated, so that it can be freed in case of an error. 172 - */ 173 - 174 173 sector_t alloc_swapdev_block(int swap) 175 174 { 176 175 unsigned long offset; 177 176 177 + /* 178 + * Allocate a swap page and register that it has been allocated, so that 179 + * it can be freed in case of an error. 180 + */ 178 181 offset = swp_offset(get_swap_page_of_type(swap)); 179 182 if (offset) { 180 183 if (swsusp_extents_insert(offset)) ··· 184 189 return 0; 185 190 } 186 191 187 - /* 188 - * free_all_swap_pages - free swap pages allocated for saving image data. 189 - * It also frees the extents used to register which swap entries had been 190 - * allocated. 191 - */ 192 - 193 192 void free_all_swap_pages(int swap) 194 193 { 195 194 struct rb_node *node; 196 195 196 + /* 197 + * Free swap pages allocated for saving image data. It also frees the 198 + * extents used to register which swap entries had been allocated. 199 + */ 197 200 while ((node = swsusp_extents.rb_node)) { 198 201 struct swsusp_extent *ext; 199 202 ··· 296 303 /* 297 304 * Saving part 298 305 */ 306 + 299 307 static int mark_swapfiles(struct swap_map_handle *handle, unsigned int flags) 300 308 { 301 309 int error; ··· 330 336 */ 331 337 unsigned int swsusp_header_flags; 332 338 333 - /** 334 - * swsusp_swap_check - check if the resume device is a swap device 335 - * and get its index (if so) 336 - * 337 - * This is called before saving image 338 - */ 339 339 static int swsusp_swap_check(void) 340 340 { 341 341 int res; 342 342 343 + /* 344 + * Check if the resume device is a swap device and get its index (if so). 345 + * This is called before saving the image. 346 + */ 343 347 if (swsusp_resume_device) 344 348 res = swap_type_of(swsusp_resume_device, swsusp_resume_block); 345 349 else ··· 353 361 354 362 return 0; 355 363 } 356 - 357 - /** 358 - * write_page - Write one page to given swap location. 359 - * @buf: Address we're writing. 360 - * @offset: Offset of the swap page we're writing to. 361 - * @hb: bio completion batch 362 - */ 363 364 364 365 static int write_page(void *buf, sector_t offset, struct hib_bio_batch *hb) 365 366 { ··· 504 519 CMP_HEADER, PAGE_SIZE) 505 520 #define CMP_SIZE (CMP_PAGES * PAGE_SIZE) 506 521 507 - /* Maximum number of threads for compression/decompression. */ 508 - #define CMP_THREADS 3 522 + /* Default number of threads for compression/decompression. */ 523 + #define CMP_THREADS 3 524 + static unsigned int hibernate_compression_threads = CMP_THREADS; 509 525 510 526 /* Minimum/maximum number of pages for read buffering. */ 511 527 #define CMP_MIN_RD_PAGES 1024 512 528 #define CMP_MAX_RD_PAGES 8192 513 - 514 - /** 515 - * save_image - save the suspend image data 516 - */ 517 529 518 530 static int save_image(struct swap_map_handle *handle, 519 531 struct snapshot_handle *snapshot, ··· 567 585 wait_queue_head_t go; /* start crc update */ 568 586 wait_queue_head_t done; /* crc update done */ 569 587 u32 *crc32; /* points to handle's crc32 */ 570 - size_t *unc_len[CMP_THREADS]; /* uncompressed lengths */ 571 - unsigned char *unc[CMP_THREADS]; /* uncompressed data */ 588 + size_t **unc_len; /* uncompressed lengths */ 589 + unsigned char **unc; /* uncompressed data */ 572 590 }; 573 591 574 - /* 575 - * CRC32 update function that runs in its own thread. 576 - */ 592 + static struct crc_data *alloc_crc_data(int nr_threads) 593 + { 594 + struct crc_data *crc; 595 + 596 + crc = kzalloc(sizeof(*crc), GFP_KERNEL); 597 + if (!crc) 598 + return NULL; 599 + 600 + crc->unc = kcalloc(nr_threads, sizeof(*crc->unc), GFP_KERNEL); 601 + if (!crc->unc) 602 + goto err_free_crc; 603 + 604 + crc->unc_len = kcalloc(nr_threads, sizeof(*crc->unc_len), GFP_KERNEL); 605 + if (!crc->unc_len) 606 + goto err_free_unc; 607 + 608 + return crc; 609 + 610 + err_free_unc: 611 + kfree(crc->unc); 612 + err_free_crc: 613 + kfree(crc); 614 + return NULL; 615 + } 616 + 617 + static void free_crc_data(struct crc_data *crc) 618 + { 619 + if (!crc) 620 + return; 621 + 622 + if (crc->thr) 623 + kthread_stop(crc->thr); 624 + 625 + kfree(crc->unc_len); 626 + kfree(crc->unc); 627 + kfree(crc); 628 + } 629 + 577 630 static int crc32_threadfn(void *data) 578 631 { 579 632 struct crc_data *d = data; ··· 633 616 } 634 617 return 0; 635 618 } 619 + 636 620 /* 637 621 * Structure used for data compression. 638 622 */ ··· 655 637 /* Indicates the image size after compression */ 656 638 static atomic64_t compressed_size = ATOMIC_INIT(0); 657 639 658 - /* 659 - * Compression function that runs in its own thread. 660 - */ 661 640 static int compress_threadfn(void *data) 662 641 { 663 642 struct cmp_data *d = data; ··· 686 671 return 0; 687 672 } 688 673 689 - /** 690 - * save_compressed_image - Save the suspend image data after compression. 691 - * @handle: Swap map handle to use for saving the image. 692 - * @snapshot: Image to read data from. 693 - * @nr_to_write: Number of pages to save. 694 - */ 695 674 static int save_compressed_image(struct swap_map_handle *handle, 696 675 struct snapshot_handle *snapshot, 697 676 unsigned int nr_to_write) ··· 712 703 * footprint. 713 704 */ 714 705 nr_threads = num_online_cpus() - 1; 715 - nr_threads = clamp_val(nr_threads, 1, CMP_THREADS); 706 + nr_threads = clamp_val(nr_threads, 1, hibernate_compression_threads); 716 707 717 708 page = (void *)__get_free_page(GFP_NOIO | __GFP_HIGH); 718 709 if (!page) { ··· 728 719 goto out_clean; 729 720 } 730 721 731 - crc = kzalloc(sizeof(*crc), GFP_KERNEL); 722 + crc = alloc_crc_data(nr_threads); 732 723 if (!crc) { 733 724 pr_err("Failed to allocate crc\n"); 734 725 ret = -ENOMEM; ··· 897 888 898 889 out_clean: 899 890 hib_finish_batch(&hb); 900 - if (crc) { 901 - if (crc->thr) 902 - kthread_stop(crc->thr); 903 - kfree(crc); 904 - } 891 + free_crc_data(crc); 905 892 if (data) { 906 893 for (thr = 0; thr < nr_threads; thr++) { 907 894 if (data[thr].thr) ··· 913 908 return ret; 914 909 } 915 910 916 - /** 917 - * enough_swap - Make sure we have enough swap to save the image. 918 - * 919 - * Returns TRUE or FALSE after checking the total amount of swap 920 - * space available from the resume partition. 921 - */ 922 - 923 911 static int enough_swap(unsigned int nr_pages) 924 912 { 925 913 unsigned int free_swap = count_swap_pages(root_swap, 1); ··· 925 927 } 926 928 927 929 /** 928 - * swsusp_write - Write entire image and metadata. 929 - * @flags: flags to pass to the "boot" kernel in the image header 930 + * swsusp_write - Write entire image and metadata. 931 + * @flags: flags to pass to the "boot" kernel in the image header 930 932 * 931 - * It is important _NOT_ to umount filesystems at this point. We want 932 - * them synced (in case something goes wrong) but we DO not want to mark 933 - * filesystem clean: it is not. (And it does not matter, if we resume 934 - * correctly, we'll mark system clean, anyway.) 933 + * It is important _NOT_ to umount filesystems at this point. We want them 934 + * synced (in case something goes wrong) but we DO not want to mark filesystem 935 + * clean: it is not. (And it does not matter, if we resume correctly, we'll mark 936 + * system clean, anyway.) 937 + * 938 + * Return: 0 on success, negative error code on failure. 935 939 */ 936 - 937 940 int swsusp_write(unsigned int flags) 938 941 { 939 942 struct swap_map_handle handle; ··· 977 978 } 978 979 979 980 /* 980 - * The following functions allow us to read data using a swap map 981 - * in a file-like way. 981 + * The following functions allow us to read data using a swap map in a file-like 982 + * way. 982 983 */ 983 984 984 985 static void release_swap_reader(struct swap_map_handle *handle) ··· 1080 1081 return 0; 1081 1082 } 1082 1083 1083 - /** 1084 - * load_image - load the image using the swap map handle 1085 - * @handle and the snapshot handle @snapshot 1086 - * (assume there are @nr_pages pages to load) 1087 - */ 1088 - 1089 1084 static int load_image(struct swap_map_handle *handle, 1090 1085 struct snapshot_handle *snapshot, 1091 1086 unsigned int nr_to_read) ··· 1150 1157 unsigned char cmp[CMP_SIZE]; /* compressed buffer */ 1151 1158 }; 1152 1159 1153 - /* 1154 - * Decompression function that runs in its own thread. 1155 - */ 1156 1160 static int decompress_threadfn(void *data) 1157 1161 { 1158 1162 struct dec_data *d = data; ··· 1184 1194 return 0; 1185 1195 } 1186 1196 1187 - /** 1188 - * load_compressed_image - Load compressed image data and decompress it. 1189 - * @handle: Swap map handle to use for loading data. 1190 - * @snapshot: Image to copy uncompressed data into. 1191 - * @nr_to_read: Number of pages to load. 1192 - */ 1193 1197 static int load_compressed_image(struct swap_map_handle *handle, 1194 1198 struct snapshot_handle *snapshot, 1195 1199 unsigned int nr_to_read) ··· 1211 1227 * footprint. 1212 1228 */ 1213 1229 nr_threads = num_online_cpus() - 1; 1214 - nr_threads = clamp_val(nr_threads, 1, CMP_THREADS); 1230 + nr_threads = clamp_val(nr_threads, 1, hibernate_compression_threads); 1215 1231 1216 1232 page = vmalloc_array(CMP_MAX_RD_PAGES, sizeof(*page)); 1217 1233 if (!page) { ··· 1227 1243 goto out_clean; 1228 1244 } 1229 1245 1230 - crc = kzalloc(sizeof(*crc), GFP_KERNEL); 1246 + crc = alloc_crc_data(nr_threads); 1231 1247 if (!crc) { 1232 1248 pr_err("Failed to allocate crc\n"); 1233 1249 ret = -ENOMEM; ··· 1494 1510 hib_finish_batch(&hb); 1495 1511 for (i = 0; i < ring_size; i++) 1496 1512 free_page((unsigned long)page[i]); 1497 - if (crc) { 1498 - if (crc->thr) 1499 - kthread_stop(crc->thr); 1500 - kfree(crc); 1501 - } 1513 + free_crc_data(crc); 1502 1514 if (data) { 1503 1515 for (thr = 0; thr < nr_threads; thr++) { 1504 1516 if (data[thr].thr) ··· 1513 1533 * swsusp_read - read the hibernation image. 1514 1534 * @flags_p: flags passed by the "frozen" kernel in the image header should 1515 1535 * be written into this memory location 1536 + * 1537 + * Return: 0 on success, negative error code on failure. 1516 1538 */ 1517 - 1518 1539 int swsusp_read(unsigned int *flags_p) 1519 1540 { 1520 1541 int error; ··· 1552 1571 /** 1553 1572 * swsusp_check - Open the resume device and check for the swsusp signature. 1554 1573 * @exclusive: Open the resume device exclusively. 1574 + * 1575 + * Return: 0 if a valid image is found, negative error code otherwise. 1555 1576 */ 1556 - 1557 1577 int swsusp_check(bool exclusive) 1558 1578 { 1559 1579 void *holder = exclusive ? &swsusp_holder : NULL; ··· 1604 1622 /** 1605 1623 * swsusp_close - close resume device. 1606 1624 */ 1607 - 1608 1625 void swsusp_close(void) 1609 1626 { 1610 1627 if (IS_ERR(hib_resume_bdev_file)) { ··· 1615 1634 } 1616 1635 1617 1636 /** 1618 - * swsusp_unmark - Unmark swsusp signature in the resume device 1637 + * swsusp_unmark - Unmark swsusp signature in the resume device 1638 + * 1639 + * Return: 0 on success, negative error code on failure. 1619 1640 */ 1620 - 1621 1641 #ifdef CONFIG_SUSPEND 1622 1642 int swsusp_unmark(void) 1623 1643 { ··· 1644 1662 } 1645 1663 #endif 1646 1664 1665 + static ssize_t hibernate_compression_threads_show(struct kobject *kobj, 1666 + struct kobj_attribute *attr, char *buf) 1667 + { 1668 + return sysfs_emit(buf, "%d\n", hibernate_compression_threads); 1669 + } 1670 + 1671 + static ssize_t hibernate_compression_threads_store(struct kobject *kobj, 1672 + struct kobj_attribute *attr, 1673 + const char *buf, size_t n) 1674 + { 1675 + unsigned long val; 1676 + 1677 + if (kstrtoul(buf, 0, &val)) 1678 + return -EINVAL; 1679 + 1680 + if (val < 1) 1681 + return -EINVAL; 1682 + 1683 + hibernate_compression_threads = val; 1684 + return n; 1685 + } 1686 + power_attr(hibernate_compression_threads); 1687 + 1688 + static struct attribute *g[] = { 1689 + &hibernate_compression_threads_attr.attr, 1690 + NULL, 1691 + }; 1692 + 1693 + static const struct attribute_group attr_group = { 1694 + .attrs = g, 1695 + }; 1696 + 1647 1697 static int __init swsusp_header_init(void) 1648 1698 { 1699 + int error; 1700 + 1701 + error = sysfs_create_group(power_kobj, &attr_group); 1702 + if (error) 1703 + return -ENOMEM; 1704 + 1649 1705 swsusp_header = (struct swsusp_header*) __get_free_page(GFP_KERNEL); 1650 1706 if (!swsusp_header) 1651 1707 panic("Could not allocate memory for swsusp_header\n"); ··· 1691 1671 } 1692 1672 1693 1673 core_initcall(swsusp_header_init); 1674 + 1675 + static int __init hibernate_compression_threads_setup(char *str) 1676 + { 1677 + int rc = kstrtouint(str, 0, &hibernate_compression_threads); 1678 + 1679 + if (rc) 1680 + return rc; 1681 + 1682 + if (hibernate_compression_threads < 1) 1683 + hibernate_compression_threads = CMP_THREADS; 1684 + 1685 + return 1; 1686 + 1687 + } 1688 + 1689 + __setup("hibernate_compression_threads=", hibernate_compression_threads_setup);

+3 -1

kernel/power/user.c

··· 278 278 if (data->frozen) 279 279 break; 280 280 281 - ksys_sync_helper(); 281 + error = pm_sleep_fs_sync(); 282 + if (error) 283 + break; 282 284 283 285 error = freeze_processes(); 284 286 if (error)

+7 -5

kernel/sched/idle.c

··· 131 131 } 132 132 133 133 static int call_cpuidle_s2idle(struct cpuidle_driver *drv, 134 - struct cpuidle_device *dev) 134 + struct cpuidle_device *dev, 135 + u64 max_latency_ns) 135 136 { 136 137 if (current_clr_polling_and_test()) 137 138 return -EBUSY; 138 139 139 - return cpuidle_enter_s2idle(drv, dev); 140 + return cpuidle_enter_s2idle(drv, dev, max_latency_ns); 140 141 } 141 142 142 143 static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev, ··· 206 205 u64 max_latency_ns; 207 206 208 207 if (idle_should_enter_s2idle()) { 208 + max_latency_ns = cpu_wakeup_latency_qos_limit() * 209 + NSEC_PER_USEC; 209 210 210 - entered_state = call_cpuidle_s2idle(drv, dev); 211 + entered_state = call_cpuidle_s2idle(drv, dev, 212 + max_latency_ns); 211 213 if (entered_state > 0) 212 214 goto exit_idle; 213 - 214 - max_latency_ns = U64_MAX; 215 215 } else { 216 216 max_latency_ns = dev->forced_idle_latency_limit_ns; 217 217 }

+62 -58

rust/kernel/opp.rs

··· 87 87 88 88 use macros::vtable; 89 89 90 - /// Creates a null-terminated slice of pointers to [`Cstring`]s. 90 + /// Creates a null-terminated slice of pointers to [`CString`]s. 91 91 fn to_c_str_array(names: &[CString]) -> Result<KVec<*const u8>> { 92 92 // Allocated a null-terminated vector of pointers. 93 93 let mut list = KVec::with_capacity(names.len() + 1, GFP_KERNEL)?; ··· 443 443 /// 444 444 /// The returned [`ConfigToken`] will remove the configuration when dropped. 445 445 pub fn set(self, dev: &Device) -> Result<ConfigToken> { 446 - let (_clk_list, clk_names) = match &self.clk_names { 447 - Some(x) => { 448 - let list = to_c_str_array(x)?; 449 - let ptr = list.as_ptr(); 450 - (Some(list), ptr) 451 - } 452 - None => (None, ptr::null()), 446 + let clk_names = self.clk_names.as_deref().map(to_c_str_array).transpose()?; 447 + let regulator_names = self 448 + .regulator_names 449 + .as_deref() 450 + .map(to_c_str_array) 451 + .transpose()?; 452 + 453 + let set_config = || { 454 + let clk_names = clk_names.as_ref().map_or(ptr::null(), |c| c.as_ptr()); 455 + let regulator_names = regulator_names.as_ref().map_or(ptr::null(), |c| c.as_ptr()); 456 + 457 + let prop_name = self 458 + .prop_name 459 + .as_ref() 460 + .map_or(ptr::null(), |p| p.as_char_ptr()); 461 + 462 + let (supported_hw, supported_hw_count) = self 463 + .supported_hw 464 + .as_ref() 465 + .map_or((ptr::null(), 0), |hw| (hw.as_ptr(), hw.len() as u32)); 466 + 467 + let (required_dev, required_dev_index) = self 468 + .required_dev 469 + .as_ref() 470 + .map_or((ptr::null_mut(), 0), |(dev, idx)| (dev.as_raw(), *idx)); 471 + 472 + let mut config = bindings::dev_pm_opp_config { 473 + clk_names, 474 + config_clks: if T::HAS_CONFIG_CLKS { 475 + Some(Self::config_clks) 476 + } else { 477 + None 478 + }, 479 + prop_name, 480 + regulator_names, 481 + config_regulators: if T::HAS_CONFIG_REGULATORS { 482 + Some(Self::config_regulators) 483 + } else { 484 + None 485 + }, 486 + supported_hw, 487 + supported_hw_count, 488 + 489 + required_dev, 490 + required_dev_index, 491 + }; 492 + 493 + // SAFETY: The requirements are satisfied by the existence of [`Device`] and its safety 494 + // requirements. The OPP core guarantees not to access fields of [`Config`] after this 495 + // call and so we don't need to save a copy of them for future use. 496 + let ret = unsafe { bindings::dev_pm_opp_set_config(dev.as_raw(), &mut config) }; 497 + 498 + to_result(ret).map(|()| ConfigToken(ret)) 453 499 }; 454 500 455 - let (_regulator_list, regulator_names) = match &self.regulator_names { 456 - Some(x) => { 457 - let list = to_c_str_array(x)?; 458 - let ptr = list.as_ptr(); 459 - (Some(list), ptr) 460 - } 461 - None => (None, ptr::null()), 462 - }; 501 + // Ensure the closure does not accidentally drop owned data; if violated, the compiler 502 + // produces E0525 with e.g.: 503 + // 504 + // ``` 505 + // closure is `FnOnce` because it moves the variable `clk_names` out of its environment 506 + // ``` 507 + let _: &dyn Fn() -> _ = &set_config; 463 508 464 - let prop_name = self 465 - .prop_name 466 - .as_ref() 467 - .map_or(ptr::null(), |p| p.as_char_ptr()); 468 - 469 - let (supported_hw, supported_hw_count) = self 470 - .supported_hw 471 - .as_ref() 472 - .map_or((ptr::null(), 0), |hw| (hw.as_ptr(), hw.len() as u32)); 473 - 474 - let (required_dev, required_dev_index) = self 475 - .required_dev 476 - .as_ref() 477 - .map_or((ptr::null_mut(), 0), |(dev, idx)| (dev.as_raw(), *idx)); 478 - 479 - let mut config = bindings::dev_pm_opp_config { 480 - clk_names, 481 - config_clks: if T::HAS_CONFIG_CLKS { 482 - Some(Self::config_clks) 483 - } else { 484 - None 485 - }, 486 - prop_name, 487 - regulator_names, 488 - config_regulators: if T::HAS_CONFIG_REGULATORS { 489 - Some(Self::config_regulators) 490 - } else { 491 - None 492 - }, 493 - supported_hw, 494 - supported_hw_count, 495 - 496 - required_dev, 497 - required_dev_index, 498 - }; 499 - 500 - // SAFETY: The requirements are satisfied by the existence of [`Device`] and its safety 501 - // requirements. The OPP core guarantees not to access fields of [`Config`] after this call 502 - // and so we don't need to save a copy of them for future use. 503 - let ret = unsafe { bindings::dev_pm_opp_set_config(dev.as_raw(), &mut config) }; 504 - 505 - to_result(ret).map(|()| ConfigToken(ret)) 509 + set_config() 506 510 } 507 511 508 512 /// Config's clk callback.

+21 -11

tools/power/cpupower/Makefile

··· 37 37 # cpufreq-bench benchmarking tool 38 38 CPUFREQ_BENCH ?= true 39 39 40 - # Do not build libraries, but build the code in statically 41 - # Libraries are still built, otherwise the Makefile code would 42 - # be rather ugly. 40 + # Build the code, including libraries, statically. 43 41 export STATIC ?= false 44 42 45 43 # Prefix to the directories we're installing to ··· 205 207 $(ECHO) " CC " $@ 206 208 $(QUIET) $(CC) $(CFLAGS) -fPIC -o $@ -c lib/$*.c 207 209 208 - $(OUTPUT)libcpupower.so.$(LIB_VER): $(LIB_OBJS) 210 + ifeq ($(strip $(STATIC)),true) 211 + LIBCPUPOWER := libcpupower.a 212 + else 213 + LIBCPUPOWER := libcpupower.so.$(LIB_VER) 214 + endif 215 + 216 + $(OUTPUT)$(LIBCPUPOWER): $(LIB_OBJS) 217 + ifeq ($(strip $(STATIC)),true) 218 + $(ECHO) " AR " $@ 219 + $(QUIET) $(AR) rcs $@ $(LIB_OBJS) 220 + else 209 221 $(ECHO) " LD " $@ 210 222 $(QUIET) $(CC) -shared $(CFLAGS) $(LDFLAGS) -o $@ \ 211 223 -Wl,-soname,libcpupower.so.$(LIB_MAJ) $(LIB_OBJS) 212 224 @ln -sf $(@F) $(OUTPUT)libcpupower.so 213 225 @ln -sf $(@F) $(OUTPUT)libcpupower.so.$(LIB_MAJ) 226 + endif 214 227 215 - libcpupower: $(OUTPUT)libcpupower.so.$(LIB_VER) 228 + libcpupower: $(OUTPUT)$(LIBCPUPOWER) 216 229 217 230 # Let all .o files depend on its .c file and all headers 218 231 # Might be worth to put this into utils/Makefile at some point of time ··· 233 224 $(ECHO) " CC " $@ 234 225 $(QUIET) $(CC) $(CFLAGS) -I./lib -I ./utils -o $@ -c $*.c 235 226 236 - $(OUTPUT)cpupower: $(UTIL_OBJS) $(OUTPUT)libcpupower.so.$(LIB_VER) 227 + $(OUTPUT)cpupower: $(UTIL_OBJS) $(OUTPUT)$(LIBCPUPOWER) 237 228 $(ECHO) " CC " $@ 238 229 ifeq ($(strip $(STATIC)),true) 239 230 $(QUIET) $(CC) $(CFLAGS) $(LDFLAGS) $(UTIL_OBJS) -lrt -lpci -L$(OUTPUT) -o $@ ··· 278 269 done; 279 270 endif 280 271 281 - compile-bench: $(OUTPUT)libcpupower.so.$(LIB_VER) 272 + compile-bench: $(OUTPUT)$(LIBCPUPOWER) 282 273 @V=$(V) confdir=$(confdir) $(MAKE) -C bench O=$(OUTPUT) 283 274 284 275 # we compile into subdirectories. if the target directory is not the ··· 296 287 -find $(OUTPUT) $ -not -type d $ -and $ -name '*~' -o -name '*.[oas]' $ -type f -print \ 297 288 | xargs rm -f 298 289 -rm -f $(OUTPUT)cpupower 290 + -rm -f $(OUTPUT)libcpupower.a 299 291 -rm -f $(OUTPUT)libcpupower.so* 300 292 -rm -rf $(OUTPUT)po/*.gmo 301 293 -rm -rf $(OUTPUT)po/*.pot ··· 305 295 306 296 install-lib: libcpupower 307 297 $(INSTALL) -d $(DESTDIR)${libdir} 298 + ifeq ($(strip $(STATIC)),true) 299 + $(CP) $(OUTPUT)libcpupower.a $(DESTDIR)${libdir}/ 300 + else 308 301 $(CP) $(OUTPUT)libcpupower.so* $(DESTDIR)${libdir}/ 302 + endif 309 303 $(INSTALL) -d $(DESTDIR)${includedir} 310 304 $(INSTALL_DATA) lib/cpufreq.h $(DESTDIR)${includedir}/cpufreq.h 311 305 $(INSTALL_DATA) lib/cpuidle.h $(DESTDIR)${includedir}/cpuidle.h ··· 350 336 @#DESTDIR must be set from outside to survive 351 337 @sbindir=$(sbindir) bindir=$(bindir) docdir=$(docdir) confdir=$(confdir) $(MAKE) -C bench O=$(OUTPUT) install 352 338 353 - ifeq ($(strip $(STATIC)),true) 354 - install: all install-tools install-man $(INSTALL_NLS) $(INSTALL_BENCH) 355 - else 356 339 install: all install-lib install-tools install-man $(INSTALL_NLS) $(INSTALL_BENCH) 357 - endif 358 340 359 341 uninstall: 360 342 - rm -f $(DESTDIR)${libdir}/libcpupower.*