at v2.6.22-rc2 743 lines 26 kB view raw
1 2 Linux kernel coding style 3 4This is a short document describing the preferred coding style for the 5linux kernel. Coding style is very personal, and I won't _force_ my 6views on anybody, but this is what goes for anything that I have to be 7able to maintain, and I'd prefer it for most other things too. Please 8at least consider the points made here. 9 10First off, I'd suggest printing out a copy of the GNU coding standards, 11and NOT read it. Burn them, it's a great symbolic gesture. 12 13Anyway, here goes: 14 15 16 Chapter 1: Indentation 17 18Tabs are 8 characters, and thus indentations are also 8 characters. 19There are heretic movements that try to make indentations 4 (or even 2!) 20characters deep, and that is akin to trying to define the value of PI to 21be 3. 22 23Rationale: The whole idea behind indentation is to clearly define where 24a block of control starts and ends. Especially when you've been looking 25at your screen for 20 straight hours, you'll find it a lot easier to see 26how the indentation works if you have large indentations. 27 28Now, some people will claim that having 8-character indentations makes 29the code move too far to the right, and makes it hard to read on a 3080-character terminal screen. The answer to that is that if you need 31more than 3 levels of indentation, you're screwed anyway, and should fix 32your program. 33 34In short, 8-char indents make things easier to read, and have the added 35benefit of warning you when you're nesting your functions too deep. 36Heed that warning. 37 38The preferred way to ease multiple indentation levels in a switch statement is 39to align the "switch" and its subordinate "case" labels in the same column 40instead of "double-indenting" the "case" labels. E.g.: 41 42 switch (suffix) { 43 case 'G': 44 case 'g': 45 mem <<= 30; 46 break; 47 case 'M': 48 case 'm': 49 mem <<= 20; 50 break; 51 case 'K': 52 case 'k': 53 mem <<= 10; 54 /* fall through */ 55 default: 56 break; 57 } 58 59 60Don't put multiple statements on a single line unless you have 61something to hide: 62 63 if (condition) do_this; 64 do_something_everytime; 65 66Don't put multiple assignments on a single line either. Kernel coding style 67is super simple. Avoid tricky expressions. 68 69Outside of comments, documentation and except in Kconfig, spaces are never 70used for indentation, and the above example is deliberately broken. 71 72Get a decent editor and don't leave whitespace at the end of lines. 73 74 75 Chapter 2: Breaking long lines and strings 76 77Coding style is all about readability and maintainability using commonly 78available tools. 79 80The limit on the length of lines is 80 columns and this is a hard limit. 81 82Statements longer than 80 columns will be broken into sensible chunks. 83Descendants are always substantially shorter than the parent and are placed 84substantially to the right. The same applies to function headers with a long 85argument list. Long strings are as well broken into shorter strings. 86 87void fun(int a, int b, int c) 88{ 89 if (condition) 90 printk(KERN_WARNING "Warning this is a long printk with " 91 "3 parameters a: %u b: %u " 92 "c: %u \n", a, b, c); 93 else 94 next_statement; 95} 96 97 Chapter 3: Placing Braces and Spaces 98 99The other issue that always comes up in C styling is the placement of 100braces. Unlike the indent size, there are few technical reasons to 101choose one placement strategy over the other, but the preferred way, as 102shown to us by the prophets Kernighan and Ritchie, is to put the opening 103brace last on the line, and put the closing brace first, thusly: 104 105 if (x is true) { 106 we do y 107 } 108 109This applies to all non-function statement blocks (if, switch, for, 110while, do). E.g.: 111 112 switch (action) { 113 case KOBJ_ADD: 114 return "add"; 115 case KOBJ_REMOVE: 116 return "remove"; 117 case KOBJ_CHANGE: 118 return "change"; 119 default: 120 return NULL; 121 } 122 123However, there is one special case, namely functions: they have the 124opening brace at the beginning of the next line, thus: 125 126 int function(int x) 127 { 128 body of function 129 } 130 131Heretic people all over the world have claimed that this inconsistency 132is ... well ... inconsistent, but all right-thinking people know that 133(a) K&R are _right_ and (b) K&R are right. Besides, functions are 134special anyway (you can't nest them in C). 135 136Note that the closing brace is empty on a line of its own, _except_ in 137the cases where it is followed by a continuation of the same statement, 138ie a "while" in a do-statement or an "else" in an if-statement, like 139this: 140 141 do { 142 body of do-loop 143 } while (condition); 144 145and 146 147 if (x == y) { 148 .. 149 } else if (x > y) { 150 ... 151 } else { 152 .... 153 } 154 155Rationale: K&R. 156 157Also, note that this brace-placement also minimizes the number of empty 158(or almost empty) lines, without any loss of readability. Thus, as the 159supply of new-lines on your screen is not a renewable resource (think 16025-line terminal screens here), you have more empty lines to put 161comments on. 162 163Do not unnecessarily use braces where a single statement will do. 164 165if (condition) 166 action(); 167 168This does not apply if one branch of a conditional statement is a single 169statement. Use braces in both branches. 170 171if (condition) { 172 do_this(); 173 do_that(); 174} else { 175 otherwise(); 176} 177 178 3.1: Spaces 179 180Linux kernel style for use of spaces depends (mostly) on 181function-versus-keyword usage. Use a space after (most) keywords. The 182notable exceptions are sizeof, typeof, alignof, and __attribute__, which look 183somewhat like functions (and are usually used with parentheses in Linux, 184although they are not required in the language, as in: "sizeof info" after 185"struct fileinfo info;" is declared). 186 187So use a space after these keywords: 188 if, switch, case, for, do, while 189but not with sizeof, typeof, alignof, or __attribute__. E.g., 190 s = sizeof(struct file); 191 192Do not add spaces around (inside) parenthesized expressions. This example is 193*bad*: 194 195 s = sizeof( struct file ); 196 197When declaring pointer data or a function that returns a pointer type, the 198preferred use of '*' is adjacent to the data name or function name and not 199adjacent to the type name. Examples: 200 201 char *linux_banner; 202 unsigned long long memparse(char *ptr, char **retptr); 203 char *match_strdup(substring_t *s); 204 205Use one space around (on each side of) most binary and ternary operators, 206such as any of these: 207 208 = + - < > * / % | & ^ <= >= == != ? : 209 210but no space after unary operators: 211 & * + - ~ ! sizeof typeof alignof __attribute__ defined 212 213no space before the postfix increment & decrement unary operators: 214 ++ -- 215 216no space after the prefix increment & decrement unary operators: 217 ++ -- 218 219and no space around the '.' and "->" structure member operators. 220 221 222 Chapter 4: Naming 223 224C is a Spartan language, and so should your naming be. Unlike Modula-2 225and Pascal programmers, C programmers do not use cute names like 226ThisVariableIsATemporaryCounter. A C programmer would call that 227variable "tmp", which is much easier to write, and not the least more 228difficult to understand. 229 230HOWEVER, while mixed-case names are frowned upon, descriptive names for 231global variables are a must. To call a global function "foo" is a 232shooting offense. 233 234GLOBAL variables (to be used only if you _really_ need them) need to 235have descriptive names, as do global functions. If you have a function 236that counts the number of active users, you should call that 237"count_active_users()" or similar, you should _not_ call it "cntusr()". 238 239Encoding the type of a function into the name (so-called Hungarian 240notation) is brain damaged - the compiler knows the types anyway and can 241check those, and it only confuses the programmer. No wonder MicroSoft 242makes buggy programs. 243 244LOCAL variable names should be short, and to the point. If you have 245some random integer loop counter, it should probably be called "i". 246Calling it "loop_counter" is non-productive, if there is no chance of it 247being mis-understood. Similarly, "tmp" can be just about any type of 248variable that is used to hold a temporary value. 249 250If you are afraid to mix up your local variable names, you have another 251problem, which is called the function-growth-hormone-imbalance syndrome. 252See chapter 6 (Functions). 253 254 255 Chapter 5: Typedefs 256 257Please don't use things like "vps_t". 258 259It's a _mistake_ to use typedef for structures and pointers. When you see a 260 261 vps_t a; 262 263in the source, what does it mean? 264 265In contrast, if it says 266 267 struct virtual_container *a; 268 269you can actually tell what "a" is. 270 271Lots of people think that typedefs "help readability". Not so. They are 272useful only for: 273 274 (a) totally opaque objects (where the typedef is actively used to _hide_ 275 what the object is). 276 277 Example: "pte_t" etc. opaque objects that you can only access using 278 the proper accessor functions. 279 280 NOTE! Opaqueness and "accessor functions" are not good in themselves. 281 The reason we have them for things like pte_t etc. is that there 282 really is absolutely _zero_ portably accessible information there. 283 284 (b) Clear integer types, where the abstraction _helps_ avoid confusion 285 whether it is "int" or "long". 286 287 u8/u16/u32 are perfectly fine typedefs, although they fit into 288 category (d) better than here. 289 290 NOTE! Again - there needs to be a _reason_ for this. If something is 291 "unsigned long", then there's no reason to do 292 293 typedef unsigned long myflags_t; 294 295 but if there is a clear reason for why it under certain circumstances 296 might be an "unsigned int" and under other configurations might be 297 "unsigned long", then by all means go ahead and use a typedef. 298 299 (c) when you use sparse to literally create a _new_ type for 300 type-checking. 301 302 (d) New types which are identical to standard C99 types, in certain 303 exceptional circumstances. 304 305 Although it would only take a short amount of time for the eyes and 306 brain to become accustomed to the standard types like 'uint32_t', 307 some people object to their use anyway. 308 309 Therefore, the Linux-specific 'u8/u16/u32/u64' types and their 310 signed equivalents which are identical to standard types are 311 permitted -- although they are not mandatory in new code of your 312 own. 313 314 When editing existing code which already uses one or the other set 315 of types, you should conform to the existing choices in that code. 316 317 (e) Types safe for use in userspace. 318 319 In certain structures which are visible to userspace, we cannot 320 require C99 types and cannot use the 'u32' form above. Thus, we 321 use __u32 and similar types in all structures which are shared 322 with userspace. 323 324Maybe there are other cases too, but the rule should basically be to NEVER 325EVER use a typedef unless you can clearly match one of those rules. 326 327In general, a pointer, or a struct that has elements that can reasonably 328be directly accessed should _never_ be a typedef. 329 330 331 Chapter 6: Functions 332 333Functions should be short and sweet, and do just one thing. They should 334fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, 335as we all know), and do one thing and do that well. 336 337The maximum length of a function is inversely proportional to the 338complexity and indentation level of that function. So, if you have a 339conceptually simple function that is just one long (but simple) 340case-statement, where you have to do lots of small things for a lot of 341different cases, it's OK to have a longer function. 342 343However, if you have a complex function, and you suspect that a 344less-than-gifted first-year high-school student might not even 345understand what the function is all about, you should adhere to the 346maximum limits all the more closely. Use helper functions with 347descriptive names (you can ask the compiler to in-line them if you think 348it's performance-critical, and it will probably do a better job of it 349than you would have done). 350 351Another measure of the function is the number of local variables. They 352shouldn't exceed 5-10, or you're doing something wrong. Re-think the 353function, and split it into smaller pieces. A human brain can 354generally easily keep track of about 7 different things, anything more 355and it gets confused. You know you're brilliant, but maybe you'd like 356to understand what you did 2 weeks from now. 357 358In source files, separate functions with one blank line. If the function is 359exported, the EXPORT* macro for it should follow immediately after the closing 360function brace line. E.g.: 361 362int system_is_up(void) 363{ 364 return system_state == SYSTEM_RUNNING; 365} 366EXPORT_SYMBOL(system_is_up); 367 368In function prototypes, include parameter names with their data types. 369Although this is not required by the C language, it is preferred in Linux 370because it is a simple way to add valuable information for the reader. 371 372 373 Chapter 7: Centralized exiting of functions 374 375Albeit deprecated by some people, the equivalent of the goto statement is 376used frequently by compilers in form of the unconditional jump instruction. 377 378The goto statement comes in handy when a function exits from multiple 379locations and some common work such as cleanup has to be done. 380 381The rationale is: 382 383- unconditional statements are easier to understand and follow 384- nesting is reduced 385- errors by not updating individual exit points when making 386 modifications are prevented 387- saves the compiler work to optimize redundant code away ;) 388 389int fun(int a) 390{ 391 int result = 0; 392 char *buffer = kmalloc(SIZE); 393 394 if (buffer == NULL) 395 return -ENOMEM; 396 397 if (condition1) { 398 while (loop1) { 399 ... 400 } 401 result = 1; 402 goto out; 403 } 404 ... 405out: 406 kfree(buffer); 407 return result; 408} 409 410 Chapter 8: Commenting 411 412Comments are good, but there is also a danger of over-commenting. NEVER 413try to explain HOW your code works in a comment: it's much better to 414write the code so that the _working_ is obvious, and it's a waste of 415time to explain badly written code. 416 417Generally, you want your comments to tell WHAT your code does, not HOW. 418Also, try to avoid putting comments inside a function body: if the 419function is so complex that you need to separately comment parts of it, 420you should probably go back to chapter 6 for a while. You can make 421small comments to note or warn about something particularly clever (or 422ugly), but try to avoid excess. Instead, put the comments at the head 423of the function, telling people what it does, and possibly WHY it does 424it. 425 426When commenting the kernel API functions, please use the kernel-doc format. 427See the files Documentation/kernel-doc-nano-HOWTO.txt and scripts/kernel-doc 428for details. 429 430Linux style for comments is the C89 "/* ... */" style. 431Don't use C99-style "// ..." comments. 432 433The preferred style for long (multi-line) comments is: 434 435 /* 436 * This is the preferred style for multi-line 437 * comments in the Linux kernel source code. 438 * Please use it consistently. 439 * 440 * Description: A column of asterisks on the left side, 441 * with beginning and ending almost-blank lines. 442 */ 443 444It's also important to comment data, whether they are basic types or derived 445types. To this end, use just one data declaration per line (no commas for 446multiple data declarations). This leaves you room for a small comment on each 447item, explaining its use. 448 449 450 Chapter 9: You've made a mess of it 451 452That's OK, we all do. You've probably been told by your long-time Unix 453user helper that "GNU emacs" automatically formats the C sources for 454you, and you've noticed that yes, it does do that, but the defaults it 455uses are less than desirable (in fact, they are worse than random 456typing - an infinite number of monkeys typing into GNU emacs would never 457make a good program). 458 459So, you can either get rid of GNU emacs, or change it to use saner 460values. To do the latter, you can stick the following in your .emacs file: 461 462(defun linux-c-mode () 463 "C mode with adjusted defaults for use with the Linux kernel." 464 (interactive) 465 (c-mode) 466 (c-set-style "K&R") 467 (setq tab-width 8) 468 (setq indent-tabs-mode t) 469 (setq c-basic-offset 8)) 470 471This will define the M-x linux-c-mode command. When hacking on a 472module, if you put the string -*- linux-c -*- somewhere on the first 473two lines, this mode will be automatically invoked. Also, you may want 474to add 475 476(setq auto-mode-alist (cons '("/usr/src/linux.*/.*\\.[ch]$" . linux-c-mode) 477 auto-mode-alist)) 478 479to your .emacs file if you want to have linux-c-mode switched on 480automagically when you edit source files under /usr/src/linux. 481 482But even if you fail in getting emacs to do sane formatting, not 483everything is lost: use "indent". 484 485Now, again, GNU indent has the same brain-dead settings that GNU emacs 486has, which is why you need to give it a few command line options. 487However, that's not too bad, because even the makers of GNU indent 488recognize the authority of K&R (the GNU people aren't evil, they are 489just severely misguided in this matter), so you just give indent the 490options "-kr -i8" (stands for "K&R, 8 character indents"), or use 491"scripts/Lindent", which indents in the latest style. 492 493"indent" has a lot of options, and especially when it comes to comment 494re-formatting you may want to take a look at the man page. But 495remember: "indent" is not a fix for bad programming. 496 497 498 Chapter 10: Configuration-files 499 500For configuration options (arch/xxx/Kconfig, and all the Kconfig files), 501somewhat different indentation is used. 502 503Help text is indented with 2 spaces. 504 505if CONFIG_EXPERIMENTAL 506 tristate CONFIG_BOOM 507 default n 508 help 509 Apply nitroglycerine inside the keyboard (DANGEROUS) 510 bool CONFIG_CHEER 511 depends on CONFIG_BOOM 512 default y 513 help 514 Output nice messages when you explode 515endif 516 517Generally, CONFIG_EXPERIMENTAL should surround all options not considered 518stable. All options that are known to trash data (experimental write- 519support for file-systems, for instance) should be denoted (DANGEROUS), other 520experimental options should be denoted (EXPERIMENTAL). 521 522 523 Chapter 11: Data structures 524 525Data structures that have visibility outside the single-threaded 526environment they are created and destroyed in should always have 527reference counts. In the kernel, garbage collection doesn't exist (and 528outside the kernel garbage collection is slow and inefficient), which 529means that you absolutely _have_ to reference count all your uses. 530 531Reference counting means that you can avoid locking, and allows multiple 532users to have access to the data structure in parallel - and not having 533to worry about the structure suddenly going away from under them just 534because they slept or did something else for a while. 535 536Note that locking is _not_ a replacement for reference counting. 537Locking is used to keep data structures coherent, while reference 538counting is a memory management technique. Usually both are needed, and 539they are not to be confused with each other. 540 541Many data structures can indeed have two levels of reference counting, 542when there are users of different "classes". The subclass count counts 543the number of subclass users, and decrements the global count just once 544when the subclass count goes to zero. 545 546Examples of this kind of "multi-level-reference-counting" can be found in 547memory management ("struct mm_struct": mm_users and mm_count), and in 548filesystem code ("struct super_block": s_count and s_active). 549 550Remember: if another thread can find your data structure, and you don't 551have a reference count on it, you almost certainly have a bug. 552 553 554 Chapter 12: Macros, Enums and RTL 555 556Names of macros defining constants and labels in enums are capitalized. 557 558#define CONSTANT 0x12345 559 560Enums are preferred when defining several related constants. 561 562CAPITALIZED macro names are appreciated but macros resembling functions 563may be named in lower case. 564 565Generally, inline functions are preferable to macros resembling functions. 566 567Macros with multiple statements should be enclosed in a do - while block: 568 569#define macrofun(a, b, c) \ 570 do { \ 571 if (a == 5) \ 572 do_this(b, c); \ 573 } while (0) 574 575Things to avoid when using macros: 576 5771) macros that affect control flow: 578 579#define FOO(x) \ 580 do { \ 581 if (blah(x) < 0) \ 582 return -EBUGGERED; \ 583 } while(0) 584 585is a _very_ bad idea. It looks like a function call but exits the "calling" 586function; don't break the internal parsers of those who will read the code. 587 5882) macros that depend on having a local variable with a magic name: 589 590#define FOO(val) bar(index, val) 591 592might look like a good thing, but it's confusing as hell when one reads the 593code and it's prone to breakage from seemingly innocent changes. 594 5953) macros with arguments that are used as l-values: FOO(x) = y; will 596bite you if somebody e.g. turns FOO into an inline function. 597 5984) forgetting about precedence: macros defining constants using expressions 599must enclose the expression in parentheses. Beware of similar issues with 600macros using parameters. 601 602#define CONSTANT 0x4000 603#define CONSTEXP (CONSTANT | 3) 604 605The cpp manual deals with macros exhaustively. The gcc internals manual also 606covers RTL which is used frequently with assembly language in the kernel. 607 608 609 Chapter 13: Printing kernel messages 610 611Kernel developers like to be seen as literate. Do mind the spelling 612of kernel messages to make a good impression. Do not use crippled 613words like "dont" and use "do not" or "don't" instead. 614 615Kernel messages do not have to be terminated with a period. 616 617Printing numbers in parentheses (%d) adds no value and should be avoided. 618 619 620 Chapter 14: Allocating memory 621 622The kernel provides the following general purpose memory allocators: 623kmalloc(), kzalloc(), kcalloc(), and vmalloc(). Please refer to the API 624documentation for further information about them. 625 626The preferred form for passing a size of a struct is the following: 627 628 p = kmalloc(sizeof(*p), ...); 629 630The alternative form where struct name is spelled out hurts readability and 631introduces an opportunity for a bug when the pointer variable type is changed 632but the corresponding sizeof that is passed to a memory allocator is not. 633 634Casting the return value which is a void pointer is redundant. The conversion 635from void pointer to any other pointer type is guaranteed by the C programming 636language. 637 638 639 Chapter 15: The inline disease 640 641There appears to be a common misperception that gcc has a magic "make me 642faster" speedup option called "inline". While the use of inlines can be 643appropriate (for example as a means of replacing macros, see Chapter 12), it 644very often is not. Abundant use of the inline keyword leads to a much bigger 645kernel, which in turn slows the system as a whole down, due to a bigger 646icache footprint for the CPU and simply because there is less memory 647available for the pagecache. Just think about it; a pagecache miss causes a 648disk seek, which easily takes 5 miliseconds. There are a LOT of cpu cycles 649that can go into these 5 miliseconds. 650 651A reasonable rule of thumb is to not put inline at functions that have more 652than 3 lines of code in them. An exception to this rule are the cases where 653a parameter is known to be a compiletime constant, and as a result of this 654constantness you *know* the compiler will be able to optimize most of your 655function away at compile time. For a good example of this later case, see 656the kmalloc() inline function. 657 658Often people argue that adding inline to functions that are static and used 659only once is always a win since there is no space tradeoff. While this is 660technically correct, gcc is capable of inlining these automatically without 661help, and the maintenance issue of removing the inline when a second user 662appears outweighs the potential value of the hint that tells gcc to do 663something it would have done anyway. 664 665 666 Chapter 16: Function return values and names 667 668Functions can return values of many different kinds, and one of the 669most common is a value indicating whether the function succeeded or 670failed. Such a value can be represented as an error-code integer 671(-Exxx = failure, 0 = success) or a "succeeded" boolean (0 = failure, 672non-zero = success). 673 674Mixing up these two sorts of representations is a fertile source of 675difficult-to-find bugs. If the C language included a strong distinction 676between integers and booleans then the compiler would find these mistakes 677for us... but it doesn't. To help prevent such bugs, always follow this 678convention: 679 680 If the name of a function is an action or an imperative command, 681 the function should return an error-code integer. If the name 682 is a predicate, the function should return a "succeeded" boolean. 683 684For example, "add work" is a command, and the add_work() function returns 0 685for success or -EBUSY for failure. In the same way, "PCI device present" is 686a predicate, and the pci_dev_present() function returns 1 if it succeeds in 687finding a matching device or 0 if it doesn't. 688 689All EXPORTed functions must respect this convention, and so should all 690public functions. Private (static) functions need not, but it is 691recommended that they do. 692 693Functions whose return value is the actual result of a computation, rather 694than an indication of whether the computation succeeded, are not subject to 695this rule. Generally they indicate failure by returning some out-of-range 696result. Typical examples would be functions that return pointers; they use 697NULL or the ERR_PTR mechanism to report failure. 698 699 700 Chapter 17: Don't re-invent the kernel macros 701 702The header file include/linux/kernel.h contains a number of macros that 703you should use, rather than explicitly coding some variant of them yourself. 704For example, if you need to calculate the length of an array, take advantage 705of the macro 706 707 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) 708 709Similarly, if you need to calculate the size of some structure member, use 710 711 #define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f)) 712 713There are also min() and max() macros that do strict type checking if you 714need them. Feel free to peruse that header file to see what else is already 715defined that you shouldn't reproduce in your code. 716 717 718 719 Appendix I: References 720 721The C Programming Language, Second Edition 722by Brian W. Kernighan and Dennis M. Ritchie. 723Prentice Hall, Inc., 1988. 724ISBN 0-13-110362-8 (paperback), 0-13-110370-9 (hardback). 725URL: http://cm.bell-labs.com/cm/cs/cbook/ 726 727The Practice of Programming 728by Brian W. Kernighan and Rob Pike. 729Addison-Wesley, Inc., 1999. 730ISBN 0-201-61586-X. 731URL: http://cm.bell-labs.com/cm/cs/tpop/ 732 733GNU manuals - where in compliance with K&R and this text - for cpp, gcc, 734gcc internals and indent, all available from http://www.gnu.org/manual/ 735 736WG14 is the international standardization working group for the programming 737language C, URL: http://www.open-std.org/JTC1/SC22/WG14/ 738 739Kernel CodingStyle, by greg@kroah.com at OLS 2002: 740http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/ 741 742-- 743Last updated on 2006-December-06.