x86/hpet: Use another crystalball to evaluate HPET usability

On recent Intel systems the HPET stops working when the system reaches PC10
idle state.

The approach of adding PCI ids to the early quirks to disable HPET on
these systems is a whack a mole game which makes no sense.

Check for PC10 instead and force disable HPET if supported. The check is
overbroad as it does not take ACPI, intel_idle enablement and command
line parameters into account. That's fine as long as there is at least
PMTIMER available to calibrate the TSC frequency. The decision can be
overruled by adding "hpet=force" on the kernel command line.

Remove the related early PCI quirks for affected Ice Cake and Coffin Lake
systems as they are not longer required. That should also cover all
other systems, i.e. Tiger Rag and newer generations, which are most
likely affected by this as well.

Fixes: Yet another hardware trainwreck
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Cc: stable@vger.kernel.org
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>

+81 -6
-6
arch/x86/kernel/early-quirks.c
··· 714 */ 715 { PCI_VENDOR_ID_INTEL, 0x0f00, 716 PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet}, 717 - { PCI_VENDOR_ID_INTEL, 0x3e20, 718 - PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet}, 719 - { PCI_VENDOR_ID_INTEL, 0x3ec4, 720 - PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet}, 721 - { PCI_VENDOR_ID_INTEL, 0x8a12, 722 - PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet}, 723 { PCI_VENDOR_ID_BROADCOM, 0x4331, 724 PCI_CLASS_NETWORK_OTHER, PCI_ANY_ID, 0, apple_airport_reset}, 725 {}
··· 714 */ 715 { PCI_VENDOR_ID_INTEL, 0x0f00, 716 PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet}, 717 { PCI_VENDOR_ID_BROADCOM, 0x4331, 718 PCI_CLASS_NETWORK_OTHER, PCI_ANY_ID, 0, apple_airport_reset}, 719 {}
+81
arch/x86/kernel/hpet.c
··· 10 #include <asm/irq_remapping.h> 11 #include <asm/hpet.h> 12 #include <asm/time.h> 13 14 #undef pr_fmt 15 #define pr_fmt(fmt) "hpet: " fmt ··· 917 return false; 918 } 919 920 /** 921 * hpet_enable - Try to setup the HPET timer. Returns 1 on success. 922 */ ··· 1005 u64 freq; 1006 1007 if (!is_hpet_capable()) 1008 return 0; 1009 1010 hpet_set_mapping();
··· 10 #include <asm/irq_remapping.h> 11 #include <asm/hpet.h> 12 #include <asm/time.h> 13 + #include <asm/mwait.h> 14 15 #undef pr_fmt 16 #define pr_fmt(fmt) "hpet: " fmt ··· 916 return false; 917 } 918 919 + static bool __init mwait_pc10_supported(void) 920 + { 921 + unsigned int eax, ebx, ecx, mwait_substates; 922 + 923 + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) 924 + return false; 925 + 926 + if (!cpu_feature_enabled(X86_FEATURE_MWAIT)) 927 + return false; 928 + 929 + if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF) 930 + return false; 931 + 932 + cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &mwait_substates); 933 + 934 + return (ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) && 935 + (ecx & CPUID5_ECX_INTERRUPT_BREAK) && 936 + (mwait_substates & (0xF << 28)); 937 + } 938 + 939 + /* 940 + * Check whether the system supports PC10. If so force disable HPET as that 941 + * stops counting in PC10. This check is overbroad as it does not take any 942 + * of the following into account: 943 + * 944 + * - ACPI tables 945 + * - Enablement of intel_idle 946 + * - Command line arguments which limit intel_idle C-state support 947 + * 948 + * That's perfectly fine. HPET is a piece of hardware designed by committee 949 + * and the only reasons why it is still in use on modern systems is the 950 + * fact that it is impossible to reliably query TSC and CPU frequency via 951 + * CPUID or firmware. 952 + * 953 + * If HPET is functional it is useful for calibrating TSC, but this can be 954 + * done via PMTIMER as well which seems to be the last remaining timer on 955 + * X86/INTEL platforms that has not been completely wreckaged by feature 956 + * creep. 957 + * 958 + * In theory HPET support should be removed altogether, but there are older 959 + * systems out there which depend on it because TSC and APIC timer are 960 + * dysfunctional in deeper C-states. 961 + * 962 + * It's only 20 years now that hardware people have been asked to provide 963 + * reliable and discoverable facilities which can be used for timekeeping 964 + * and per CPU timer interrupts. 965 + * 966 + * The probability that this problem is going to be solved in the 967 + * forseeable future is close to zero, so the kernel has to be cluttered 968 + * with heuristics to keep up with the ever growing amount of hardware and 969 + * firmware trainwrecks. Hopefully some day hardware people will understand 970 + * that the approach of "This can be fixed in software" is not sustainable. 971 + * Hope dies last... 972 + */ 973 + static bool __init hpet_is_pc10_damaged(void) 974 + { 975 + unsigned long long pcfg; 976 + 977 + /* Check whether PC10 substates are supported */ 978 + if (!mwait_pc10_supported()) 979 + return false; 980 + 981 + /* Check whether PC10 is enabled in PKG C-state limit */ 982 + rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg); 983 + if ((pcfg & 0xF) < 8) 984 + return false; 985 + 986 + if (hpet_force_user) { 987 + pr_warn("HPET force enabled via command line, but dysfunctional in PC10.\n"); 988 + return false; 989 + } 990 + 991 + pr_info("HPET dysfunctional in PC10. Force disabled.\n"); 992 + boot_hpet_disable = true; 993 + return true; 994 + } 995 + 996 /** 997 * hpet_enable - Try to setup the HPET timer. Returns 1 on success. 998 */ ··· 927 u64 freq; 928 929 if (!is_hpet_capable()) 930 + return 0; 931 + 932 + if (hpet_is_pc10_damaged()) 933 return 0; 934 935 hpet_set_mapping();