Linux kernel
============
There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.
In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``. The formatted documentation can also be read online at:
https://www.kernel.org/doc/html/latest/
There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.
code
Clone this repository
https://tangled.org/tjh.dev/kernel
git@gordian.tjh.dev:tjh.dev/kernel
For self-hosted knots, clone URLs may differ based on your setup.
Pull more Kbuild updates from Masahiro Yamada:
- improve boolinit.cocci and use_after_iter.cocci semantic patches
- fix alignment for kallsyms
- move 'asm goto' compiler test to Kconfig and clean up jump_label
CONFIG option
- generate asm-generic wrappers automatically if arch does not
implement mandatory UAPI headers
- remove redundant generic-y defines
- misc cleanups
* tag 'kbuild-v4.21-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: rename generated .*conf-cfg to *conf-cfg
kbuild: remove unnecessary stubs for archheader and archscripts
kbuild: use assignment instead of define ... endef for filechk_* rules
arch: remove redundant UAPI generic-y defines
kbuild: generate asm-generic wrappers if mandatory headers are missing
arch: remove stale comments "UAPI Header export list"
riscv: remove redundant kernel-space generic-y
kbuild: change filechk to surround the given command with { }
kbuild: remove redundant target cleaning on failure
kbuild: clean up rule_dtc_dt_yaml
kbuild: remove UIMAGE_IN and UIMAGE_OUT
jump_label: move 'asm goto' support test to Kconfig
kallsyms: lower alignment on ARM
scripts: coccinelle: boolinit: drop warnings on named constants
scripts: coccinelle: check for redeclaration
kconfig: remove unused "file" field of yylval union
nds32: remove redundant kernel-space generic-y
nios2: remove unneeded HAS_DMA define
Pull perf tooling updates form Ingo Molnar:
"A final batch of perf tooling changes: mostly fixes and small
improvements"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
perf session: Add comment for perf_session__register_idle_thread()
perf thread-stack: Fix thread stack processing for the idle task
perf thread-stack: Allocate an array of thread stacks
perf thread-stack: Factor out thread_stack__init()
perf thread-stack: Allow for a thread stack array
perf thread-stack: Avoid direct reference to the thread's stack
perf thread-stack: Tidy thread_stack__bottom() usage
perf thread-stack: Simplify some code in thread_stack__process()
tools gpio: Allow overriding CFLAGS
tools power turbostat: Override CFLAGS assignments and add LDFLAGS to build command
tools thermal tmon: Allow overriding CFLAGS assignments
tools power x86_energy_perf_policy: Override CFLAGS assignments and add LDFLAGS to build command
perf c2c: Increase the HITM ratio limit for displayed cachelines
perf c2c: Change the default coalesce setup
perf trace beauty ioctl: Beautify USBDEVFS_ commands
perf trace beauty: Export function to get the files for a thread
perf trace: Wire up ioctl's USBDEBFS_ cmd table generator
perf beauty ioctl: Add generator for USBDEVFS_ ioctl commands
tools headers uapi: Grab a copy of usbdevice_fs.h
perf trace: Store the major number for a file when storing its pathname
...
Remove the dot-prefixing since it is just a matter of the
.gitignore file.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The semantics of what "in core" means for the mincore() system call are
somewhat unclear, but Linux has always (since 2.3.52, which is when
mincore() was initially done) treated it as "page is available in page
cache" rather than "page is mapped in the mapping".
The problem with that traditional semantic is that it exposes a lot of
system cache state that it really probably shouldn't, and that users
shouldn't really even care about.
So let's try to avoid that information leak by simply changing the
semantics to be that mincore() counts actual mapped pages, not pages
that might be cheaply mapped if they were faulted (note the "might be"
part of the old semantics: being in the cache doesn't actually guarantee
that you can access them without IO anyway, since things like network
filesystems may have to revalidate the cache before use).
In many ways the old semantics were somewhat insane even aside from the
information leak issue. From the very beginning (and that beginning is
a long time ago: 2.3.52 was released in March 2000, I think), the code
had a comment saying
Later we can get more picky about what "in core" means precisely.
and this is that "later". Admittedly it is much later than is really
comfortable.
NOTE! This is a real semantic change, and it is for example known to
change the output of "fincore", since that program literally does a
mmmap without populating it, and then doing "mincore()" on that mapping
that doesn't actually have any pages in it.
I'm hoping that nobody actually has any workflow that cares, and the
info leak is real.
We may have to do something different if it turns out that people have
valid reasons to want the old semantics, and if we can limit the
information leak sanely.
Cc: Kevin Easton <kevin@guarana.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Masatake YAMATO <yamato@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
perf c2c:
Jiri Olsa:
- Change the default coalesce setup to from '--coalesce pid,iaddr' to just '--coalesce iaddr'.
- Increase the HITM ratio limit for displayed cachelines.
perf script:
Andi Kleen:
- Fix LBR skid dump problems in brstackinsn.
perf trace:
Arnaldo Carvalho de Melo:
- Check if the raw_syscalls:sys_{enter,exit} are setup before setting tp filter.
- Do not hardcode the size of the tracepoint common_ fields.
- Beautify USBDEFFS_ ioctl commands.
Colin Ian King:
- Use correct SECCOMP prefix spelling, "SECOMP_*" -> "SECCOMP_*".
perf python:
Jiri Olsa:
- Do not force closing original perf descriptor in evlist.get_pollfd().
tools misc:
Jiri Olsa:
- Allow overriding CFLAGS and LDFLAGS.
perf build:
Stanislav Fomichev:
- Don't unconditionally link the libbfd feature test to -liberty and -lz
thread-stack:
Adrian Hunter:
- Fix processing for the idle task, having a stack per cpu.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make simply skips a missing rule when it is marked as .PHONY.
Remove the dummy targets.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Commit 594cc251fdd0 ("make 'user_access_begin()' do 'access_ok()'")
broke both alpha and SH booting in qemu, as noticed by Guenter Roeck.
It turns out that the bug wasn't actually in that commit itself (which
would have been surprising: it was mostly a no-op), but in how the
addition of access_ok() to the strncpy_from_user() and strnlen_user()
functions now triggered the case where those functions would test the
access of the very last byte of the user address space.
The string functions actually did that user range test before too, but
they did it manually by just comparing against user_addr_max(). But
with user_access_begin() doing the check (using "access_ok()"), it now
exposed problems in the architecture implementations of that function.
For example, on alpha, the access_ok() helper macro looked like this:
#define __access_ok(addr, size) \
((get_fs().seg & (addr | size | (addr+size))) == 0)
and what it basically tests is of any of the high bits get set (the
USER_DS masking value is 0xfffffc0000000000).
And that's completely wrong for the "addr+size" check. Because it's
off-by-one for the case where we check to the very end of the user
address space, which is exactly what the strn*_user() functions do.
Why? Because "addr+size" will be exactly the size of the address space,
so trying to access the last byte of the user address space will fail
the __access_ok() check, even though it shouldn't. As a result, the
user string accessor functions failed consistently - because they
literally don't know how long the string is going to be, and the max
access is going to be that last byte of the user address space.
Side note: that alpha macro is buggy for another reason too - it re-uses
the arguments twice.
And SH has another version of almost the exact same bug:
#define __addr_ok(addr) \
((unsigned long __force)(addr) < current_thread_info()->addr_limit.seg)
so far so good: yes, a user address must be below the limit. But then:
#define __access_ok(addr, size) \
(__addr_ok((addr) + (size)))
is wrong with the exact same off-by-one case: the case when "addr+size"
is exactly _equal_ to the limit is actually perfectly fine (think "one
byte access at the last address of the user address space")
The SH version is actually seriously buggy in another way: it doesn't
actually check for overflow, even though it did copy the _comment_ that
talks about overflow.
So it turns out that both SH and alpha actually have completely buggy
implementations of access_ok(), but they happened to work in practice
(although the SH overflow one is a serious serious security bug, not
that anybody likely cares about SH security).
This fixes the problems by using a similar macro on both alpha and SH.
It isn't trying to be clever, the end address is based on this logic:
unsigned long __ao_end = __ao_a + __ao_b - !!__ao_b;
which basically says "add start and length, and then subtract one unless
the length was zero". We can't subtract one for a zero length, or we'd
just hit an underflow instead.
For a lot of access_ok() users the length is a constant, so this isn't
actually as expensive as it initially looks.
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Caused by making the variable static:
kernel/sched/fair.c:119:21: warning: 'capacity_margin' defined but not used [-Wunused-variable]
Seems easiest to just move it up under the existing ifdef CONFIG_SMP
that's a few lines above.
Fixes: ed8885a14433a ('sched/fair: Make some variables static')
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a comment to perf_session__register_idle_thread() to bring attention to
a pitfall with the idle task thread structure. The pitfall is that there
should really be a 'struct thread' for the idle task of each cpu, but there
is only one that can have pid == tid == 0.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181221120620.9659-9-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
You do not have to use define ... endef for filechk_* rules.
For simple cases, the use of assignment looks cleaner, IMHO.
I updated the usage for scripts/Kbuild.include in case somebody
misunderstands the 'define ... endif' is the requirement.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Pull fscrypt updates from Ted Ts'o:
"Add Adiantum support for fscrypt"
* tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
fscrypt: add Adiantum support
Pull x86 platform update from Ingo Molnar:
"An OLPC platform support simplification patch"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/olpc: Do not call of_platform_bus_probe()
perf creates a single 'struct thread' to represent the idle task. That
is because threads are identified by PID and TID, and the idle task
always has PID == TID == 0.
However, there are actually separate idle tasks for each CPU. That
creates a problem for thread stack processing which assumes that each
thread has a single stack, not one stack per CPU.
Fix that by passing through the CPU number, and in the case of the idle
"thread", pick the thread stack from an array based on the CPU number.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181221120620.9659-8-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now that Kbuild automatically creates asm-generic wrappers for missing
mandatory headers, it is redundant to list the same headers in
generic-y and mandatory-y.
Suggested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>