···11+ Semantics and Behavior of Local Atomic Operations22+33+ Mathieu Desnoyers44+55+66+ This document explains the purpose of the local atomic operations, how77+to implement them for any given architecture and shows how they can be used88+properly. It also stresses on the precautions that must be taken when reading99+those local variables across CPUs when the order of memory writes matters.1010+1111+1212+1313+* Purpose of local atomic operations1414+1515+Local atomic operations are meant to provide fast and highly reentrant per CPU1616+counters. They minimize the performance cost of standard atomic operations by1717+removing the LOCK prefix and memory barriers normally required to synchronize1818+across CPUs.1919+2020+Having fast per CPU atomic counters is interesting in many cases : it does not2121+require disabling interrupts to protect from interrupt handlers and it permits2222+coherent counters in NMI handlers. It is especially useful for tracing purposes2323+and for various performance monitoring counters.2424+2525+Local atomic operations only guarantee variable modification atomicity wrt the2626+CPU which owns the data. Therefore, care must taken to make sure that only one2727+CPU writes to the local_t data. This is done by using per cpu data and making2828+sure that we modify it from within a preemption safe context. It is however2929+permitted to read local_t data from any CPU : it will then appear to be written3030+out of order wrt other memory writes on the owner CPU.3131+3232+3333+* Implementation for a given architecture3434+3535+It can be done by slightly modifying the standard atomic operations : only3636+their UP variant must be kept. It typically means removing LOCK prefix (on3737+i386 and x86_64) and any SMP sychronization barrier. If the architecture does3838+not have a different behavior between SMP and UP, including asm-generic/local.h3939+in your archtecture's local.h is sufficient.4040+4141+The local_t type is defined as an opaque signed long by embedding an4242+atomic_long_t inside a structure. This is made so a cast from this type to a4343+long fails. The definition looks like :4444+4545+typedef struct { atomic_long_t a; } local_t;4646+4747+4848+* How to use local atomic operations4949+5050+#include <linux/percpu.h>5151+#include <asm/local.h>5252+5353+static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0);5454+5555+5656+* Counting5757+5858+Counting is done on all the bits of a signed long.5959+6060+In preemptible context, use get_cpu_var() and put_cpu_var() around local atomic6161+operations : it makes sure that preemption is disabled around write access to6262+the per cpu variable. For instance :6363+6464+ local_inc(&get_cpu_var(counters));6565+ put_cpu_var(counters);6666+6767+If you are already in a preemption-safe context, you can directly use6868+__get_cpu_var() instead.6969+7070+ local_inc(&__get_cpu_var(counters));7171+7272+7373+7474+* Reading the counters7575+7676+Those local counters can be read from foreign CPUs to sum the count. Note that7777+the data seen by local_read across CPUs must be considered to be out of order7878+relatively to other memory writes happening on the CPU that owns the data.7979+8080+ long sum = 0;8181+ for_each_online_cpu(cpu)8282+ sum += local_read(&per_cpu(counters, cpu));8383+8484+If you want to use a remote local_read to synchronize access to a resource8585+between CPUs, explicit smp_wmb() and smp_rmb() memory barriers must be used8686+respectively on the writer and the reader CPUs. It would be the case if you use8787+the local_t variable as a counter of bytes written in a buffer : there should8888+be a smp_wmb() between the buffer write and the counter increment and also a8989+smp_rmb() between the counter read and the buffer read.9090+9191+9292+Here is a sample module which implements a basic per cpu counter using local.h.9393+9494+--- BEGIN ---9595+/* test-local.c9696+ *9797+ * Sample module for local.h usage.9898+ */9999+100100+101101+#include <asm/local.h>102102+#include <linux/module.h>103103+#include <linux/timer.h>104104+105105+static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0);106106+107107+static struct timer_list test_timer;108108+109109+/* IPI called on each CPU. */110110+static void test_each(void *info)111111+{112112+ /* Increment the counter from a non preemptible context */113113+ printk("Increment on cpu %d\n", smp_processor_id());114114+ local_inc(&__get_cpu_var(counters));115115+116116+ /* This is what incrementing the variable would look like within a117117+ * preemptible context (it disables preemption) :118118+ *119119+ * local_inc(&get_cpu_var(counters));120120+ * put_cpu_var(counters);121121+ */122122+}123123+124124+static void do_test_timer(unsigned long data)125125+{126126+ int cpu;127127+128128+ /* Increment the counters */129129+ on_each_cpu(test_each, NULL, 0, 1);130130+ /* Read all the counters */131131+ printk("Counters read from CPU %d\n", smp_processor_id());132132+ for_each_online_cpu(cpu) {133133+ printk("Read : CPU %d, count %ld\n", cpu,134134+ local_read(&per_cpu(counters, cpu)));135135+ }136136+ del_timer(&test_timer);137137+ test_timer.expires = jiffies + 1000;138138+ add_timer(&test_timer);139139+}140140+141141+static int __init test_init(void)142142+{143143+ /* initialize the timer that will increment the counter */144144+ init_timer(&test_timer);145145+ test_timer.function = do_test_timer;146146+ test_timer.expires = jiffies + 1;147147+ add_timer(&test_timer);148148+149149+ return 0;150150+}151151+152152+static void __exit test_exit(void)153153+{154154+ del_timer_sync(&test_timer);155155+}156156+157157+module_init(test_init);158158+module_exit(test_exit);159159+160160+MODULE_LICENSE("GPL");161161+MODULE_AUTHOR("Mathieu Desnoyers");162162+MODULE_DESCRIPTION("Local Atomic Ops");163163+--- END ---