Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

ipmi:msghandler: retry to get device id on an error

We fail to get the BMCS's device id with low probability when loading
the ipmi driver and it causes BMC device registration failed. When this
issue occurs we got below kernel prints:

[Wed Sep 9 19:52:03 2020] ipmi_si IPI0001:00: IPMI message handler:
device id demangle failed: -22
[Wed Sep 9 19:52:03 2020] IPMI BT: using default values
[Wed Sep 9 19:52:03 2020] IPMI BT: req2rsp=5 secs retries=2
[Wed Sep 9 19:52:03 2020] ipmi_si IPI0001:00: Unable to get the
device id: -5
[Wed Sep 9 19:52:04 2020] ipmi_si IPI0001:00: Unable to register
device: error -5

When this issue happens, we want to manually unload the driver and try to
load it again, but it can't be unloaded by 'rmmod' as it is already 'in
use'.

We add a print in handle_one_recv_msg(), when this issue happens,
the msg we received is "Recv: 1c 01 d5", which means the data_len is 1,
data[0] is 0xd5 (completion code), which means "bmc cannot execute
command. Command, or request parameter(s), not supported in present
state". Debug code:
static int handle_one_recv_msg(struct ipmi_smi *intf,
struct ipmi_smi_msg *msg) {
printk("Recv: %*ph\n", msg->rsp_size, msg->rsp);
... ...
}
Then in ipmi_demangle_device_id(), it returned '-EINVAL' as 'data_len < 7'
and 'data[0] != 0'.

We created this patch to retry the get device id when this error
happens. We reproduced this issue again and the retry succeed on the
first retry, we finally got the correct msg and then all is ok:
Recv: 1c 01 00 01 81 05 84 02 af db 07 00 01 00 b9 00 10 00

So use a retry machanism in this patch to give bmc more opportunity to
correctly response kernel when we received specific completion codes.

Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
Message-Id: <20200915071817.4484-1-tian.xianting@h3c.com>
[Cleaned up the verbage a bit in the header and prints.]
Signed-off-by: Corey Minyard <cminyard@mvista.com>

authored by

Xianting Tian and committed by
Corey Minyard
f8910ffa c2b1e76d

+27 -4
+25 -4
drivers/char/ipmi/ipmi_msghandler.c
··· 34 34 #include <linux/uuid.h> 35 35 #include <linux/nospec.h> 36 36 #include <linux/vmalloc.h> 37 + #include <linux/delay.h> 37 38 38 39 #define IPMI_DRIVER_VERSION "39.2" 39 40 ··· 61 60 #else 62 61 #define IPMI_PANIC_DEFAULT IPMI_SEND_PANIC_EVENT_NONE 63 62 #endif 63 + 64 + #define GET_DEVICE_ID_MAX_RETRY 5 65 + 64 66 static enum ipmi_panic_event_op ipmi_send_panic_event = IPMI_PANIC_DEFAULT; 65 67 66 68 static int panic_op_write_handler(const char *val, ··· 321 317 int dyn_guid_set; 322 318 struct kref usecount; 323 319 struct work_struct remove_work; 320 + char cc; /* completion code */ 324 321 }; 325 322 #define to_bmc_device(x) container_of((x), struct bmc_device, pdev.dev) 326 323 ··· 2386 2381 msg->msg.data, msg->msg.data_len, &intf->bmc->fetch_id); 2387 2382 if (rv) { 2388 2383 dev_warn(intf->si_dev, "device id demangle failed: %d\n", rv); 2384 + /* record completion code when error */ 2385 + intf->bmc->cc = msg->msg.data[0]; 2389 2386 intf->bmc->dyn_id_set = 0; 2390 2387 } else { 2391 2388 /* ··· 2433 2426 static int __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc) 2434 2427 { 2435 2428 int rv; 2436 - 2437 - bmc->dyn_id_set = 2; 2429 + unsigned int retry_count = 0; 2438 2430 2439 2431 intf->null_user_handler = bmc_device_id_handler; 2432 + 2433 + retry: 2434 + bmc->cc = 0; 2435 + bmc->dyn_id_set = 2; 2440 2436 2441 2437 rv = send_get_device_id_cmd(intf); 2442 2438 if (rv) ··· 2447 2437 2448 2438 wait_event(intf->waitq, bmc->dyn_id_set != 2); 2449 2439 2450 - if (!bmc->dyn_id_set) 2440 + if (!bmc->dyn_id_set) { 2441 + if ((bmc->cc == IPMI_DEVICE_IN_FW_UPDATE_ERR 2442 + || bmc->cc == IPMI_DEVICE_IN_INIT_ERR 2443 + || bmc->cc == IPMI_NOT_IN_MY_STATE_ERR) 2444 + && ++retry_count <= GET_DEVICE_ID_MAX_RETRY) { 2445 + msleep(500); 2446 + dev_warn(intf->si_dev, 2447 + "BMC returned 0x%2.2x, retry get bmc device id\n", 2448 + bmc->cc); 2449 + goto retry; 2450 + } 2451 + 2451 2452 rv = -EIO; /* Something went wrong in the fetch. */ 2453 + } 2452 2454 2453 2455 /* dyn_id_set makes the id data available. */ 2454 2456 smp_rmb(); ··· 3268 3246 /* It's the one we want */ 3269 3247 if (msg->msg.data[0] != 0) { 3270 3248 /* Got an error from the channel, just go on. */ 3271 - 3272 3249 if (msg->msg.data[0] == IPMI_INVALID_COMMAND_ERR) { 3273 3250 /* 3274 3251 * If the MC does not support this
+2
include/uapi/linux/ipmi_msgdefs.h
··· 69 69 #define IPMI_ERR_MSG_TRUNCATED 0xc6 70 70 #define IPMI_REQ_LEN_INVALID_ERR 0xc7 71 71 #define IPMI_REQ_LEN_EXCEEDED_ERR 0xc8 72 + #define IPMI_DEVICE_IN_FW_UPDATE_ERR 0xd1 73 + #define IPMI_DEVICE_IN_INIT_ERR 0xd2 72 74 #define IPMI_NOT_IN_MY_STATE_ERR 0xd5 /* IPMI 2.0 */ 73 75 #define IPMI_LOST_ARBITRATION_ERR 0x81 74 76 #define IPMI_BUS_ERR 0x82