Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Documentation/local_ops.txt: convert to ReST markup

... and move to core-api folder.

Signed-off-by: Silvio Fricke <silvio.fricke@gmail.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>

authored by

Silvio Fricke and committed by
Jonathan Corbet
c232694e c3cbf1a7

+207 -191
+1
Documentation/core-api/index.rst
··· 8 8 :maxdepth: 1 9 9 10 10 assoc_array 11 + local_ops 11 12 workqueue 12 13 13 14 .. only:: subproject
+206
Documentation/core-api/local_ops.rst
··· 1 + 2 + .. _local_ops: 3 + 4 + ================================================= 5 + Semantics and Behavior of Local Atomic Operations 6 + ================================================= 7 + 8 + :Author: Mathieu Desnoyers 9 + 10 + 11 + This document explains the purpose of the local atomic operations, how 12 + to implement them for any given architecture and shows how they can be used 13 + properly. It also stresses on the precautions that must be taken when reading 14 + those local variables across CPUs when the order of memory writes matters. 15 + 16 + .. note:: 17 + 18 + Note that ``local_t`` based operations are not recommended for general 19 + kernel use. Please use the ``this_cpu`` operations instead unless there is 20 + really a special purpose. Most uses of ``local_t`` in the kernel have been 21 + replaced by ``this_cpu`` operations. ``this_cpu`` operations combine the 22 + relocation with the ``local_t`` like semantics in a single instruction and 23 + yield more compact and faster executing code. 24 + 25 + 26 + Purpose of local atomic operations 27 + ================================== 28 + 29 + Local atomic operations are meant to provide fast and highly reentrant per CPU 30 + counters. They minimize the performance cost of standard atomic operations by 31 + removing the LOCK prefix and memory barriers normally required to synchronize 32 + across CPUs. 33 + 34 + Having fast per CPU atomic counters is interesting in many cases: it does not 35 + require disabling interrupts to protect from interrupt handlers and it permits 36 + coherent counters in NMI handlers. It is especially useful for tracing purposes 37 + and for various performance monitoring counters. 38 + 39 + Local atomic operations only guarantee variable modification atomicity wrt the 40 + CPU which owns the data. Therefore, care must taken to make sure that only one 41 + CPU writes to the ``local_t`` data. This is done by using per cpu data and 42 + making sure that we modify it from within a preemption safe context. It is 43 + however permitted to read ``local_t`` data from any CPU: it will then appear to 44 + be written out of order wrt other memory writes by the owner CPU. 45 + 46 + 47 + Implementation for a given architecture 48 + ======================================= 49 + 50 + It can be done by slightly modifying the standard atomic operations: only 51 + their UP variant must be kept. It typically means removing LOCK prefix (on 52 + i386 and x86_64) and any SMP synchronization barrier. If the architecture does 53 + not have a different behavior between SMP and UP, including 54 + ``asm-generic/local.h`` in your architecture's ``local.h`` is sufficient. 55 + 56 + The ``local_t`` type is defined as an opaque ``signed long`` by embedding an 57 + ``atomic_long_t`` inside a structure. This is made so a cast from this type to 58 + a ``long`` fails. The definition looks like:: 59 + 60 + typedef struct { atomic_long_t a; } local_t; 61 + 62 + 63 + Rules to follow when using local atomic operations 64 + ================================================== 65 + 66 + * Variables touched by local ops must be per cpu variables. 67 + * *Only* the CPU owner of these variables must write to them. 68 + * This CPU can use local ops from any context (process, irq, softirq, nmi, ...) 69 + to update its ``local_t`` variables. 70 + * Preemption (or interrupts) must be disabled when using local ops in 71 + process context to make sure the process won't be migrated to a 72 + different CPU between getting the per-cpu variable and doing the 73 + actual local op. 74 + * When using local ops in interrupt context, no special care must be 75 + taken on a mainline kernel, since they will run on the local CPU with 76 + preemption already disabled. I suggest, however, to explicitly 77 + disable preemption anyway to make sure it will still work correctly on 78 + -rt kernels. 79 + * Reading the local cpu variable will provide the current copy of the 80 + variable. 81 + * Reads of these variables can be done from any CPU, because updates to 82 + "``long``", aligned, variables are always atomic. Since no memory 83 + synchronization is done by the writer CPU, an outdated copy of the 84 + variable can be read when reading some *other* cpu's variables. 85 + 86 + 87 + How to use local atomic operations 88 + ================================== 89 + 90 + :: 91 + 92 + #include <linux/percpu.h> 93 + #include <asm/local.h> 94 + 95 + static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0); 96 + 97 + 98 + Counting 99 + ======== 100 + 101 + Counting is done on all the bits of a signed long. 102 + 103 + In preemptible context, use ``get_cpu_var()`` and ``put_cpu_var()`` around 104 + local atomic operations: it makes sure that preemption is disabled around write 105 + access to the per cpu variable. For instance:: 106 + 107 + local_inc(&get_cpu_var(counters)); 108 + put_cpu_var(counters); 109 + 110 + If you are already in a preemption-safe context, you can use 111 + ``this_cpu_ptr()`` instead:: 112 + 113 + local_inc(this_cpu_ptr(&counters)); 114 + 115 + 116 + 117 + Reading the counters 118 + ==================== 119 + 120 + Those local counters can be read from foreign CPUs to sum the count. Note that 121 + the data seen by local_read across CPUs must be considered to be out of order 122 + relatively to other memory writes happening on the CPU that owns the data:: 123 + 124 + long sum = 0; 125 + for_each_online_cpu(cpu) 126 + sum += local_read(&per_cpu(counters, cpu)); 127 + 128 + If you want to use a remote local_read to synchronize access to a resource 129 + between CPUs, explicit ``smp_wmb()`` and ``smp_rmb()`` memory barriers must be used 130 + respectively on the writer and the reader CPUs. It would be the case if you use 131 + the ``local_t`` variable as a counter of bytes written in a buffer: there should 132 + be a ``smp_wmb()`` between the buffer write and the counter increment and also a 133 + ``smp_rmb()`` between the counter read and the buffer read. 134 + 135 + 136 + Here is a sample module which implements a basic per cpu counter using 137 + ``local.h``:: 138 + 139 + /* test-local.c 140 + * 141 + * Sample module for local.h usage. 142 + */ 143 + 144 + 145 + #include <asm/local.h> 146 + #include <linux/module.h> 147 + #include <linux/timer.h> 148 + 149 + static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0); 150 + 151 + static struct timer_list test_timer; 152 + 153 + /* IPI called on each CPU. */ 154 + static void test_each(void *info) 155 + { 156 + /* Increment the counter from a non preemptible context */ 157 + printk("Increment on cpu %d\n", smp_processor_id()); 158 + local_inc(this_cpu_ptr(&counters)); 159 + 160 + /* This is what incrementing the variable would look like within a 161 + * preemptible context (it disables preemption) : 162 + * 163 + * local_inc(&get_cpu_var(counters)); 164 + * put_cpu_var(counters); 165 + */ 166 + } 167 + 168 + static void do_test_timer(unsigned long data) 169 + { 170 + int cpu; 171 + 172 + /* Increment the counters */ 173 + on_each_cpu(test_each, NULL, 1); 174 + /* Read all the counters */ 175 + printk("Counters read from CPU %d\n", smp_processor_id()); 176 + for_each_online_cpu(cpu) { 177 + printk("Read : CPU %d, count %ld\n", cpu, 178 + local_read(&per_cpu(counters, cpu))); 179 + } 180 + del_timer(&test_timer); 181 + test_timer.expires = jiffies + 1000; 182 + add_timer(&test_timer); 183 + } 184 + 185 + static int __init test_init(void) 186 + { 187 + /* initialize the timer that will increment the counter */ 188 + init_timer(&test_timer); 189 + test_timer.function = do_test_timer; 190 + test_timer.expires = jiffies + 1; 191 + add_timer(&test_timer); 192 + 193 + return 0; 194 + } 195 + 196 + static void __exit test_exit(void) 197 + { 198 + del_timer_sync(&test_timer); 199 + } 200 + 201 + module_init(test_init); 202 + module_exit(test_exit); 203 + 204 + MODULE_LICENSE("GPL"); 205 + MODULE_AUTHOR("Mathieu Desnoyers"); 206 + MODULE_DESCRIPTION("Local Atomic Ops");
-191
Documentation/local_ops.txt
··· 1 - Semantics and Behavior of Local Atomic Operations 2 - 3 - Mathieu Desnoyers 4 - 5 - 6 - This document explains the purpose of the local atomic operations, how 7 - to implement them for any given architecture and shows how they can be used 8 - properly. It also stresses on the precautions that must be taken when reading 9 - those local variables across CPUs when the order of memory writes matters. 10 - 11 - Note that local_t based operations are not recommended for general kernel use. 12 - Please use the this_cpu operations instead unless there is really a special purpose. 13 - Most uses of local_t in the kernel have been replaced by this_cpu operations. 14 - this_cpu operations combine the relocation with the local_t like semantics in 15 - a single instruction and yield more compact and faster executing code. 16 - 17 - 18 - * Purpose of local atomic operations 19 - 20 - Local atomic operations are meant to provide fast and highly reentrant per CPU 21 - counters. They minimize the performance cost of standard atomic operations by 22 - removing the LOCK prefix and memory barriers normally required to synchronize 23 - across CPUs. 24 - 25 - Having fast per CPU atomic counters is interesting in many cases : it does not 26 - require disabling interrupts to protect from interrupt handlers and it permits 27 - coherent counters in NMI handlers. It is especially useful for tracing purposes 28 - and for various performance monitoring counters. 29 - 30 - Local atomic operations only guarantee variable modification atomicity wrt the 31 - CPU which owns the data. Therefore, care must taken to make sure that only one 32 - CPU writes to the local_t data. This is done by using per cpu data and making 33 - sure that we modify it from within a preemption safe context. It is however 34 - permitted to read local_t data from any CPU : it will then appear to be written 35 - out of order wrt other memory writes by the owner CPU. 36 - 37 - 38 - * Implementation for a given architecture 39 - 40 - It can be done by slightly modifying the standard atomic operations : only 41 - their UP variant must be kept. It typically means removing LOCK prefix (on 42 - i386 and x86_64) and any SMP synchronization barrier. If the architecture does 43 - not have a different behavior between SMP and UP, including asm-generic/local.h 44 - in your architecture's local.h is sufficient. 45 - 46 - The local_t type is defined as an opaque signed long by embedding an 47 - atomic_long_t inside a structure. This is made so a cast from this type to a 48 - long fails. The definition looks like : 49 - 50 - typedef struct { atomic_long_t a; } local_t; 51 - 52 - 53 - * Rules to follow when using local atomic operations 54 - 55 - - Variables touched by local ops must be per cpu variables. 56 - - _Only_ the CPU owner of these variables must write to them. 57 - - This CPU can use local ops from any context (process, irq, softirq, nmi, ...) 58 - to update its local_t variables. 59 - - Preemption (or interrupts) must be disabled when using local ops in 60 - process context to make sure the process won't be migrated to a 61 - different CPU between getting the per-cpu variable and doing the 62 - actual local op. 63 - - When using local ops in interrupt context, no special care must be 64 - taken on a mainline kernel, since they will run on the local CPU with 65 - preemption already disabled. I suggest, however, to explicitly 66 - disable preemption anyway to make sure it will still work correctly on 67 - -rt kernels. 68 - - Reading the local cpu variable will provide the current copy of the 69 - variable. 70 - - Reads of these variables can be done from any CPU, because updates to 71 - "long", aligned, variables are always atomic. Since no memory 72 - synchronization is done by the writer CPU, an outdated copy of the 73 - variable can be read when reading some _other_ cpu's variables. 74 - 75 - 76 - * How to use local atomic operations 77 - 78 - #include <linux/percpu.h> 79 - #include <asm/local.h> 80 - 81 - static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0); 82 - 83 - 84 - * Counting 85 - 86 - Counting is done on all the bits of a signed long. 87 - 88 - In preemptible context, use get_cpu_var() and put_cpu_var() around local atomic 89 - operations : it makes sure that preemption is disabled around write access to 90 - the per cpu variable. For instance : 91 - 92 - local_inc(&get_cpu_var(counters)); 93 - put_cpu_var(counters); 94 - 95 - If you are already in a preemption-safe context, you can use 96 - this_cpu_ptr() instead. 97 - 98 - local_inc(this_cpu_ptr(&counters)); 99 - 100 - 101 - 102 - * Reading the counters 103 - 104 - Those local counters can be read from foreign CPUs to sum the count. Note that 105 - the data seen by local_read across CPUs must be considered to be out of order 106 - relatively to other memory writes happening on the CPU that owns the data. 107 - 108 - long sum = 0; 109 - for_each_online_cpu(cpu) 110 - sum += local_read(&per_cpu(counters, cpu)); 111 - 112 - If you want to use a remote local_read to synchronize access to a resource 113 - between CPUs, explicit smp_wmb() and smp_rmb() memory barriers must be used 114 - respectively on the writer and the reader CPUs. It would be the case if you use 115 - the local_t variable as a counter of bytes written in a buffer : there should 116 - be a smp_wmb() between the buffer write and the counter increment and also a 117 - smp_rmb() between the counter read and the buffer read. 118 - 119 - 120 - Here is a sample module which implements a basic per cpu counter using local.h. 121 - 122 - --- BEGIN --- 123 - /* test-local.c 124 - * 125 - * Sample module for local.h usage. 126 - */ 127 - 128 - 129 - #include <asm/local.h> 130 - #include <linux/module.h> 131 - #include <linux/timer.h> 132 - 133 - static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0); 134 - 135 - static struct timer_list test_timer; 136 - 137 - /* IPI called on each CPU. */ 138 - static void test_each(void *info) 139 - { 140 - /* Increment the counter from a non preemptible context */ 141 - printk("Increment on cpu %d\n", smp_processor_id()); 142 - local_inc(this_cpu_ptr(&counters)); 143 - 144 - /* This is what incrementing the variable would look like within a 145 - * preemptible context (it disables preemption) : 146 - * 147 - * local_inc(&get_cpu_var(counters)); 148 - * put_cpu_var(counters); 149 - */ 150 - } 151 - 152 - static void do_test_timer(unsigned long data) 153 - { 154 - int cpu; 155 - 156 - /* Increment the counters */ 157 - on_each_cpu(test_each, NULL, 1); 158 - /* Read all the counters */ 159 - printk("Counters read from CPU %d\n", smp_processor_id()); 160 - for_each_online_cpu(cpu) { 161 - printk("Read : CPU %d, count %ld\n", cpu, 162 - local_read(&per_cpu(counters, cpu))); 163 - } 164 - del_timer(&test_timer); 165 - test_timer.expires = jiffies + 1000; 166 - add_timer(&test_timer); 167 - } 168 - 169 - static int __init test_init(void) 170 - { 171 - /* initialize the timer that will increment the counter */ 172 - init_timer(&test_timer); 173 - test_timer.function = do_test_timer; 174 - test_timer.expires = jiffies + 1; 175 - add_timer(&test_timer); 176 - 177 - return 0; 178 - } 179 - 180 - static void __exit test_exit(void) 181 - { 182 - del_timer_sync(&test_timer); 183 - } 184 - 185 - module_init(test_init); 186 - module_exit(test_exit); 187 - 188 - MODULE_LICENSE("GPL"); 189 - MODULE_AUTHOR("Mathieu Desnoyers"); 190 - MODULE_DESCRIPTION("Local Atomic Ops"); 191 - --- END ---