Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'rds'

Sowmini Varadhan says:

====================
RDS: RDS-core fixes

This patch-series updates the RDS core and rds-tcp modules with
some bug fixes that were originally authored by Andy Grover,
Zach Brown, and Chris Mason.

v2: Code review comment by Sergei Shtylov
V3: DaveM comments:
- dropped patches 3, 5 for "heuristic" changes in rds_send_xmit().
Investigation into the root-cause of these IB-triggered changes
produced the feedback: "I don't remember seeing "RDS: Stuck RM"
message in last 1-1.5 years and checking with other folks. It may very
well be some old workaround for stale connection for which long term
fix is already made and this part of code not exercised anymore."

Any such fixes, *if* they are needed, can/should be done in the
IB specific RDS transport modules.

- similarly dropped the LL_SEND_FULL patch (patch 6 in v2 set)

v4: Documentation/networking/rds.txt contains incorrect references
to "missing sysctl values for pf_rds and sol_rds in mainline".
The sysctl values were never needed in mainline, thus fix the
documentation.

v5: Clarify comment per http://www.spinics.net/lists/netdev/msg324220.html

v6: Re-added entire version history to cover letter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

+38 -8
+4 -5
Documentation/networking/rds.txt
··· 62 62 ================ 63 63 64 64 AF_RDS, PF_RDS, SOL_RDS 65 - These constants haven't been assigned yet, because RDS isn't in 66 - mainline yet. Currently, the kernel module assigns some constant 67 - and publishes it to user space through two sysctl files 68 - /proc/sys/net/rds/pf_rds 69 - /proc/sys/net/rds/sol_rds 65 + AF_RDS and PF_RDS are the domain type to be used with socket(2) 66 + to create RDS sockets. SOL_RDS is the socket-level to be used 67 + with setsockopt(2) and getsockopt(2) for RDS specific socket 68 + options. 70 69 71 70 fd = socket(PF_RDS, SOCK_SEQPACKET, 0); 72 71 This creates a new, unbound RDS socket.
+2 -1
net/rds/connection.c
··· 130 130 rcu_read_lock(); 131 131 conn = rds_conn_lookup(head, laddr, faddr, trans); 132 132 if (conn && conn->c_loopback && conn->c_trans != &rds_loop_transport && 133 - !is_outgoing) { 133 + laddr == faddr && !is_outgoing) { 134 134 /* This is a looped back IB connection, and we're 135 135 * called by the code handling the incoming connect. 136 136 * We need a second connection object into which we ··· 193 193 } 194 194 195 195 atomic_set(&conn->c_state, RDS_CONN_DOWN); 196 + conn->c_send_gen = 0; 196 197 conn->c_reconnect_jiffies = 0; 197 198 INIT_DELAYED_WORK(&conn->c_send_w, rds_send_worker); 198 199 INIT_DELAYED_WORK(&conn->c_recv_w, rds_recv_worker);
+1
net/rds/rds.h
··· 110 110 void *c_transport_data; 111 111 112 112 atomic_t c_state; 113 + unsigned long c_send_gen; 113 114 unsigned long c_flags; 114 115 unsigned long c_reconnect_jiffies; 115 116 struct delayed_work c_send_w;
+31 -2
net/rds/send.c
··· 140 140 struct scatterlist *sg; 141 141 int ret = 0; 142 142 LIST_HEAD(to_be_dropped); 143 + int batch_count; 144 + unsigned long send_gen = 0; 143 145 144 146 restart: 147 + batch_count = 0; 145 148 146 149 /* 147 150 * sendmsg calls here after having queued its message on the send ··· 158 155 ret = -ENOMEM; 159 156 goto out; 160 157 } 158 + 159 + /* 160 + * we record the send generation after doing the xmit acquire. 161 + * if someone else manages to jump in and do some work, we'll use 162 + * this to avoid a goto restart farther down. 163 + * 164 + * The acquire_in_xmit() check above ensures that only one 165 + * caller can increment c_send_gen at any time. 166 + */ 167 + conn->c_send_gen++; 168 + send_gen = conn->c_send_gen; 161 169 162 170 /* 163 171 * rds_conn_shutdown() sets the conn state and then tests RDS_IN_XMIT, ··· 215 201 */ 216 202 if (!rm) { 217 203 unsigned int len; 204 + 205 + batch_count++; 206 + 207 + /* we want to process as big a batch as we can, but 208 + * we also want to avoid softlockups. If we've been 209 + * through a lot of messages, lets back off and see 210 + * if anyone else jumps in 211 + */ 212 + if (batch_count >= 1024) 213 + goto over_batch; 218 214 219 215 spin_lock_irqsave(&conn->c_lock, flags); 220 216 ··· 381 357 } 382 358 } 383 359 360 + over_batch: 384 361 if (conn->c_trans->xmit_complete) 385 362 conn->c_trans->xmit_complete(conn); 386 - 387 363 release_in_xmit(conn); 388 364 389 365 /* Nuke any messages we decided not to retransmit. */ ··· 404 380 * If the transport cannot continue (i.e ret != 0), then it must 405 381 * call us when more room is available, such as from the tx 406 382 * completion handler. 383 + * 384 + * We have an extra generation check here so that if someone manages 385 + * to jump in after our release_in_xmit, we'll see that they have done 386 + * some work and we will skip our goto 407 387 */ 408 388 if (ret == 0) { 409 389 smp_mb(); 410 - if (!list_empty(&conn->c_send_queue)) { 390 + if (!list_empty(&conn->c_send_queue) && 391 + send_gen == conn->c_send_gen) { 411 392 rds_stats_inc(s_send_lock_queue_raced); 412 393 goto restart; 413 394 }