Linux kernel ============ This file was moved to Documentation/admin-guide/README.rst Please notice that there are several guides for kernel developers and users. These guides can be rendered in a number of formats, like HTML and PDF. In order to build the documentation, use ``make htmldocs`` or ``make pdfdocs``. There are various text files in the Documentation/ subdirectory, several of them using the Restructured Text markup notation. See Documentation/00-INDEX for a list of what is contained in each file. Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about the problems which may result by upgrading your kernel.
Configure Feed
Select the types of activity you want to include in your feed.
Clone this repository
For self-hosted knots, clone URLs may differ based on your setup.
Download tar.gz
After refcnt reaches zero, vlan_vid_del() could free
dev->vlan_info via RCU:
RCU_INIT_POINTER(dev->vlan_info, NULL);
call_rcu(&vlan_info->rcu, vlan_info_rcu_free);
However, the pointer 'grp' still points to that memory
since it is set before vlan_vid_del():
vlan_info = rtnl_dereference(dev->vlan_info);
if (!vlan_info)
goto out;
grp = &vlan_info->grp;
Depends on when that RCU callback is scheduled, we could
trigger a use-after-free in vlan_group_for_each_dev()
right following this vlan_vid_del().
Fix it by moving vlan_vid_del() before setting grp. This
is also symmetric to the vlan_vid_add() we call in
vlan_device_event().
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Fixes: efc73f4bbc23 ("net: Fix memory leak - vlan_info struct")
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code does not return after successfully preparing the VLAN
addition on every ports member of a it. Fix this.
Fixes: 1ca4aa9cd4cc ("net: dsa: check VLAN capability of every switch")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code does not return after successfully preparing the MDB
addition on every ports member of a multicast group. Fix this.
Fixes: a1a6b7ea7f2d ("net: dsa: add cross-chip multicast support")
Reported-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the cause of an WARNING indicatng TCP has pending
retransmission in Open state in tcp_fastretrans_alert().
The root cause is a bad interaction between path mtu probing,
if enabled, and the RACK loss detection. Upong receiving a SACK
above the sequence of the MTU probing packet, RACK could mark the
probe packet lost in tcp_fastretrans_alert(), prior to calling
tcp_simple_retransmit().
tcp_simple_retransmit() only enters Loss state if it newly marks
the probe packet lost. If the probe packet is already identified as
lost by RACK, the sender remains in Open state with some packets
marked lost and retransmitted. Then the next SACK would trigger
the warning. The likely scenario is that the probe packet was
lost due to its size or network congestion. The actual impact of
this warning is small by potentially entering fast recovery an
ACK later.
The simple fix is always entering recovery (Loss) state if some
packet is marked lost during path MTU probing.
Fixes: a0370b3f3f2c ("tcp: enable RACK loss detection to trigger recovery")
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a GSO skb of truesize O is segmented into 2 new skbs of truesize N1
and N2, we want to transfer socket ownership to the new fresh skbs.
In order to avoid expensive atomic operations on a cache line subject to
cache bouncing, we replace the sequence :
refcount_add(N1, &sk->sk_wmem_alloc);
refcount_add(N2, &sk->sk_wmem_alloc); // repeated by number of segments
refcount_sub(O, &sk->sk_wmem_alloc);
by a single
refcount_add(sum_of(N) - O, &sk->sk_wmem_alloc);
Problem is :
In some pathological cases, sum(N) - O might be a negative number, and
syzkaller bot was apparently able to trigger this trace [1]
atomic_t was ok with this construct, but we need to take care of the
negative delta with refcount_t
[1]
refcount_t: saturated; leaking memory.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 8404 at lib/refcount.c:77 refcount_add_not_zero+0x198/0x200 lib/refcount.c:77
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 8404 Comm: syz-executor2 Not tainted 4.14.0-rc5-mm1+ #20
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
panic+0x1e4/0x41c kernel/panic.c:183
__warn+0x1c4/0x1e0 kernel/panic.c:546
report_bug+0x211/0x2d0 lib/bug.c:183
fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:177
do_trap_no_signal arch/x86/kernel/traps.c:211 [inline]
do_trap+0x260/0x390 arch/x86/kernel/traps.c:260
do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:297
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:310
invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
RIP: 0010:refcount_add_not_zero+0x198/0x200 lib/refcount.c:77
RSP: 0018:ffff8801c606e3a0 EFLAGS: 00010282
RAX: 0000000000000026 RBX: 0000000000001401 RCX: 0000000000000000
RDX: 0000000000000026 RSI: ffffc900036fc000 RDI: ffffed0038c0dc68
RBP: ffff8801c606e430 R08: 0000000000000001 R09: 0000000000000000
R10: ffff8801d97f5eba R11: 0000000000000000 R12: ffff8801d5acf73c
R13: 1ffff10038c0dc75 R14: 00000000ffffffff R15: 00000000fffff72f
refcount_add+0x1b/0x60 lib/refcount.c:101
tcp_gso_segment+0x10d0/0x16b0 net/ipv4/tcp_offload.c:155
tcp4_gso_segment+0xd4/0x310 net/ipv4/tcp_offload.c:51
inet_gso_segment+0x60c/0x11c0 net/ipv4/af_inet.c:1271
skb_mac_gso_segment+0x33f/0x660 net/core/dev.c:2749
__skb_gso_segment+0x35f/0x7f0 net/core/dev.c:2821
skb_gso_segment include/linux/netdevice.h:3971 [inline]
validate_xmit_skb+0x4ba/0xb20 net/core/dev.c:3074
__dev_queue_xmit+0xe49/0x2070 net/core/dev.c:3497
dev_queue_xmit+0x17/0x20 net/core/dev.c:3538
neigh_hh_output include/net/neighbour.h:471 [inline]
neigh_output include/net/neighbour.h:479 [inline]
ip_finish_output2+0xece/0x1460 net/ipv4/ip_output.c:229
ip_finish_output+0x85e/0xd10 net/ipv4/ip_output.c:317
NF_HOOK_COND include/linux/netfilter.h:238 [inline]
ip_output+0x1cc/0x860 net/ipv4/ip_output.c:405
dst_output include/net/dst.h:459 [inline]
ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
ip_queue_xmit+0x8c6/0x18e0 net/ipv4/ip_output.c:504
tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1137
tcp_write_xmit+0x663/0x4de0 net/ipv4/tcp_output.c:2341
__tcp_push_pending_frames+0xa0/0x250 net/ipv4/tcp_output.c:2513
tcp_push_pending_frames include/net/tcp.h:1722 [inline]
tcp_data_snd_check net/ipv4/tcp_input.c:5050 [inline]
tcp_rcv_established+0x8c7/0x18a0 net/ipv4/tcp_input.c:5497
tcp_v4_do_rcv+0x2ab/0x7d0 net/ipv4/tcp_ipv4.c:1460
sk_backlog_rcv include/net/sock.h:909 [inline]
__release_sock+0x124/0x360 net/core/sock.c:2264
release_sock+0xa4/0x2a0 net/core/sock.c:2776
tcp_sendmsg+0x3a/0x50 net/ipv4/tcp.c:1462
inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
sock_sendmsg_nosec net/socket.c:632 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:642
___sys_sendmsg+0x31c/0x890 net/socket.c:2048
__sys_sendmmsg+0x1e6/0x5f0 net/socket.c:2138
Fixes: 14afee4b6092 ("net: convert sock.sk_wmem_alloc from atomic_t to refcount_t")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rds_ib_recv_refill() is a function that refills an IB receive
queue. It can be called from both the CQE handler (tasklet) and a
worker thread.
Just after the call to ib_post_recv(), a debug message is printed with
rdsdebug():
ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
(long) ib_sg_dma_address(
ic->i_cm_id->device,
&recv->r_frag->f_sg),
ret);
Now consider an invocation of rds_ib_recv_refill() from the worker
thread, which is preemptible. Further, assume that the worker thread
is preempted between the ib_post_recv() and rdsdebug() statements.
Then, if the preemption is due to a receive CQE event, the
rds_ib_recv_cqe_handler() will be invoked. This function processes
receive completions, including freeing up data structures, such as the
recv->r_frag.
In this scenario, rds_ib_recv_cqe_handler() will process the receive
WR posted above. That implies, that the recv->r_frag has been freed
before the above rdsdebug() statement has been executed. When it is
later executed, we will have a NULL pointer dereference:
[ 4088.068008] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 4088.076754] IP: rds_ib_recv_refill+0x87/0x620 [rds_rdma]
[ 4088.082686] PGD 0 P4D 0
[ 4088.085515] Oops: 0000 [#1] SMP
[ 4088.089015] Modules linked in: rds_rdma(OE) rds(OE) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) mlx4_ib(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_core(E) binfmt_misc(E) sb_edac(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) pcbc(E) aesni_intel(E) crypto_simd(E) iTCO_wdt(E) glue_helper(E) iTCO_vendor_support(E) sg(E) cryptd(E) pcspkr(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) shpchp(E) ioatdma(E) i2c_i801(E) wmi(E) lpc_ich(E) mei_me(E) mei(E) mfd_core(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ip_tables(E) ext4(E) mbcache(E) jbd2(E) fscrypto(E) mgag200(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E)
[ 4088.168486] fb_sys_fops(E) ahci(E) ixgbe(E) libahci(E) ttm(E) mdio(E) ptp(E) pps_core(E) drm(E) sd_mod(E) libata(E) crc32c_intel(E) mlx4_core(E) i2c_core(E) dca(E) megaraid_sas(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [last unloaded: rds]
[ 4088.193442] CPU: 20 PID: 1244 Comm: kworker/20:2 Tainted: G OE 4.14.0-rc7.master.20171105.ol7.x86_64 #1
[ 4088.205097] Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 31110000 03/03/2017
[ 4088.216074] Workqueue: ib_cm cm_work_handler [ib_cm]
[ 4088.221614] task: ffff885fa11d0000 task.stack: ffffc9000e598000
[ 4088.228224] RIP: 0010:rds_ib_recv_refill+0x87/0x620 [rds_rdma]
[ 4088.234736] RSP: 0018:ffffc9000e59bb68 EFLAGS: 00010286
[ 4088.240568] RAX: 0000000000000000 RBX: ffffc9002115d050 RCX: ffffc9002115d050
[ 4088.248535] RDX: ffffffffa0521380 RSI: ffffffffa0522158 RDI: ffffffffa0525580
[ 4088.256498] RBP: ffffc9000e59bbf8 R08: 0000000000000005 R09: 0000000000000000
[ 4088.264465] R10: 0000000000000339 R11: 0000000000000001 R12: 0000000000000000
[ 4088.272433] R13: ffff885f8c9d8000 R14: ffffffff81a0a060 R15: ffff884676268000
[ 4088.280397] FS: 0000000000000000(0000) GS:ffff885fbec80000(0000) knlGS:0000000000000000
[ 4088.289434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4088.295846] CR2: 0000000000000020 CR3: 0000000001e09005 CR4: 00000000001606e0
[ 4088.303816] Call Trace:
[ 4088.306557] rds_ib_cm_connect_complete+0xe0/0x220 [rds_rdma]
[ 4088.312982] ? __dynamic_pr_debug+0x8c/0xb0
[ 4088.317664] ? __queue_work+0x142/0x3c0
[ 4088.321944] rds_rdma_cm_event_handler+0x19e/0x250 [rds_rdma]
[ 4088.328370] cma_ib_handler+0xcd/0x280 [rdma_cm]
[ 4088.333522] cm_process_work+0x25/0x120 [ib_cm]
[ 4088.338580] cm_work_handler+0xd6b/0x17aa [ib_cm]
[ 4088.343832] process_one_work+0x149/0x360
[ 4088.348307] worker_thread+0x4d/0x3e0
[ 4088.352397] kthread+0x109/0x140
[ 4088.355996] ? rescuer_thread+0x380/0x380
[ 4088.360467] ? kthread_park+0x60/0x60
[ 4088.364563] ret_from_fork+0x25/0x30
[ 4088.368548] Code: 48 89 45 90 48 89 45 98 eb 4d 0f 1f 44 00 00 48 8b 43 08 48 89 d9 48 c7 c2 80 13 52 a0 48 c7 c6 58 21 52 a0 48 c7 c7 80 55 52 a0 <4c> 8b 48 20 44 89 64 24 08 48 8b 40 30 49 83 e1 fc 48 89 04 24
[ 4088.389612] RIP: rds_ib_recv_refill+0x87/0x620 [rds_rdma] RSP: ffffc9000e59bb68
[ 4088.397772] CR2: 0000000000000020
[ 4088.401505] ---[ end trace fe922e6ccf004431 ]---
This bug was provoked by compiling rds out-of-tree with
EXTRA_CFLAGS="-DRDS_DEBUG -DDEBUG" and inserting an artificial delay
between the rdsdebug() and ib_ib_port_recv() statements:
/* XXX when can this fail? */
ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
+ if (can_wait)
+ usleep_range(1000, 5000);
rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
(long) ib_sg_dma_address(
The fix is simply to move the rdsdebug() statement up before the
ib_post_recv() and remove the printing of ret, which is taken care of
anyway by the non-debug code.
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull final power management fixes from Rafael Wysocki:
"These fix a regression in the schedutil cpufreq governor introduced by
a recent change and blacklist Dell XPS13 9360 from using the Low Power
S0 Idle _DSM interface which triggers serious problems on one of these
machines.
Specifics:
- Prevent the schedutil cpufreq governor from using the utilization
of a wrong CPU in some cases which started to happen after one of
the recent changes in it (Chris Redpath).
- Blacklist Dell XPS13 9360 from using the Low Power S0 Idle _DSM
interface as that causes serious issue (related to NVMe) to appear
on one of these machines, even though the other Dells XPS13 9360 in
somewhat different HW configurations behave correctly (Rafael
Wysocki)"
* tag 'pm-final-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / PM: Blacklist Low Power S0 Idle _DSM for Dell XPS13 9360
cpufreq: schedutil: Examine the correct CPU when we update util
Pull sound fixes from Takashi Iwai:
"The amount of the changes isn't as quite small as wished, nevertheless
they are straight fixes that deserve merging to 4.14 final.
Most of fixes are about ALSA core bugs spotted by fuzzer: a follow-up
fix for the previous nested rwsem patch, a fix to avoid the resource
hogs due to too many concurrent ALSA timer invocations, and a fix for
a crash with SYSEX MIDI transfer over OSS sequencer emulation that is
used by none but fuzzer.
The rest are usual HD-audio and USB-audio device-specific quirks,
which are safe to apply"
* tag 'sound-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - fix headset mic problem for Dell machines with alc274
ALSA: seq: Fix OSS sysex delivery in OSS emulation
ALSA: seq: Avoid invalid lockdep class warning
ALSA: timer: Limit max instances per timer
ALSA: usb-audio: support new Amanero Combo384 firmware version
* pm-cpufreq-sched:
cpufreq: schedutil: Examine the correct CPU when we update util
Pull networking fixes from David Miller:
1) Fix use-after-free in IPSEC input parsing, desintation address
pointer was loaded before pskb_may_pull() which can change the SKB
data pointers. From Florian Westphal.
2) Stack out-of-bounds read in xfrm_state_find(), from Steffen
Klassert.
3) IPVS state of SKB is not properly reset when moving between
namespaces, from Ye Yin.
4) Fix crash in asix driver suspend and resume, from Andrey Konovalov.
5) Don't deliver ipv6 l2tp tunnel packets to ipv4 l2tp tunnels, and
vice versa, from Guillaume Nault.
6) Fix DSACK undo on non-dup ACKs, from Priyaranjan Jha.
7) Fix regression in bond_xmit_hash()'s behavior after the TCP port
selection changes back in 4.2, from Hangbin Liu.
8) Two divide by zero bugs in USB networking drivers when parsing
descriptors, from Bjorn Mork.
9) Fix bonding slaves being stuck in BOND_LINK_FAIL state, from Jay
Vosburgh.
10) Missing skb_reset_mac_header() in qmi_wwan, from Kristian Evensen.
11) Fix the destruction of tc action object races properly, from Cong
Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits)
cls_u32: use tcf_exts_get_net() before call_rcu()
cls_tcindex: use tcf_exts_get_net() before call_rcu()
cls_rsvp: use tcf_exts_get_net() before call_rcu()
cls_route: use tcf_exts_get_net() before call_rcu()
cls_matchall: use tcf_exts_get_net() before call_rcu()
cls_fw: use tcf_exts_get_net() before call_rcu()
cls_flower: use tcf_exts_get_net() before call_rcu()
cls_flow: use tcf_exts_get_net() before call_rcu()
cls_cgroup: use tcf_exts_get_net() before call_rcu()
cls_bpf: use tcf_exts_get_net() before call_rcu()
cls_basic: use tcf_exts_get_net() before call_rcu()
net_sched: introduce tcf_exts_get_net() and tcf_exts_put_net()
Revert "net_sched: hold netns refcnt for each action"
net: usb: asix: fill null-ptr-deref in asix_suspend
Revert "net: usb: asix: fill null-ptr-deref in asix_suspend"
qmi_wwan: Add missing skb_reset_mac_header-call
bonding: fix slave stuck in BOND_LINK_FAIL state
qrtr: Move to postcore_initcall
net: qmi_wwan: fix divide by 0 on bad descriptors
net: cdc_ether: fix divide by 0 on bad descriptors
...
Confirmed with Kailang of Realtek, the pin 0x19 is for Headset Mic, and
the pin 0x1a is for Headphone Mic, he suggested to apply
ALC269_FIXUP_DELL1_MIC_NO_PRESENCE to fix this problem. And we
verified applying this FIXUP can fix this problem.
Cc: <stable@vger.kernel.org>
Cc: Kailang Yang <kailang@realtek.com>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
At least one Dell XPS13 9360 is reported to have serious issues with
the Low Power S0 Idle _DSM interface and since this machine model
generally can do ACPI S3 just fine, add a blacklist entry to disable
that interface for Dell XPS13 9360.
Fixes: 8110dd281e15 (ACPI / sleep: EC-based wakeup from suspend-to-idle on recent systems)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196907
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 4.13+ <stable@vger.kernel.org> # 4.13+
After commit 674e75411fc2 (sched: cpufreq: Allow remote cpufreq
callbacks) we stopped to always read the utilization for the CPU we
are running the governor on, and instead we read it for the CPU
which we've been told has updated utilization. This is stored in
sugov_cpu->cpu.
The value is set in sugov_register() but we clear it in sugov_start()
which leads to always looking at the utilization of CPU0 instead of
the correct one.
Fix this by consolidating the initialization code into sugov_start().
Fixes: 674e75411fc2 (sched: cpufreq: Allow remote cpufreq callbacks)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: Brendan Jackman <brendan.jackman@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>