Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drbd: Ignore the exit code of a fence-peer handler if it returns too late

In case the connection was established and lost again before
the a fence-peer handler returns, ignore the exit code of this
instance. (And use the exit code of the later started instance)

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

authored by

Philipp Reisner and committed by
Jens Axboe
28e448bb f9eb7bf4

+17 -3
+1
drivers/block/drbd/drbd_int.h
··· 832 832 unsigned susp_nod:1; /* IO suspended because no data */ 833 833 unsigned susp_fen:1; /* IO suspended because fence peer handler runs */ 834 834 struct mutex cstate_mutex; /* Protects graceful disconnects */ 835 + unsigned int connect_cnt; /* Inc each time a connection is established */ 835 836 836 837 unsigned long flags; 837 838 struct net_conf *net_conf; /* content protected by rcu */
+13 -2
drivers/block/drbd/drbd_nl.c
··· 417 417 418 418 bool conn_try_outdate_peer(struct drbd_tconn *tconn) 419 419 { 420 + unsigned int connect_cnt; 420 421 union drbd_state mask = { }; 421 422 union drbd_state val = { }; 422 423 enum drbd_fencing_p fp; ··· 428 427 conn_err(tconn, "Expected cstate < C_WF_REPORT_PARAMS\n"); 429 428 return false; 430 429 } 430 + 431 + spin_lock_irq(&tconn->req_lock); 432 + connect_cnt = tconn->connect_cnt; 433 + spin_unlock_irq(&tconn->req_lock); 431 434 432 435 fp = highest_fencing_policy(tconn); 433 436 switch (fp) { ··· 497 492 here, because we might were able to re-establish the connection in the 498 493 meantime. */ 499 494 spin_lock_irq(&tconn->req_lock); 500 - if (tconn->cstate < C_WF_REPORT_PARAMS && !test_bit(STATE_SENT, &tconn->flags)) 501 - _conn_request_state(tconn, mask, val, CS_VERBOSE); 495 + if (tconn->cstate < C_WF_REPORT_PARAMS && !test_bit(STATE_SENT, &tconn->flags)) { 496 + if (tconn->connect_cnt != connect_cnt) 497 + /* In case the connection was established and droped 498 + while the fence-peer handler was running, ignore it */ 499 + conn_info(tconn, "Ignoring fence-peer exit code\n"); 500 + else 501 + _conn_request_state(tconn, mask, val, CS_VERBOSE); 502 + } 502 503 spin_unlock_irq(&tconn->req_lock); 503 504 504 505 return conn_highest_pdsk(tconn) <= D_OUTDATED;
+3 -1
drivers/block/drbd/drbd_state.c
··· 1115 1115 drbd_thread_restart_nowait(&mdev->tconn->receiver); 1116 1116 1117 1117 /* Resume AL writing if we get a connection */ 1118 - if (os.conn < C_CONNECTED && ns.conn >= C_CONNECTED) 1118 + if (os.conn < C_CONNECTED && ns.conn >= C_CONNECTED) { 1119 1119 drbd_resume_al(mdev); 1120 + mdev->tconn->connect_cnt++; 1121 + } 1120 1122 1121 1123 /* remember last attach time so request_timer_fn() won't 1122 1124 * kill newly established sessions while we are still trying to thaw