Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

docs: s390: convert docs to ReST and rename to *.rst

Convert all text files with s390 documentation to ReST format.

Tried to preserve as much as possible the original document
format. Still, some of the files required some work in order
for it to be visible on both plain text and after converted
to html.

The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>

authored by

Mauro Carvalho Chehab and committed by
Heiko Carstens
8b4a503d dc3988f4

+3091 -2235
+2 -2
Documentation/admin-guide/kernel-parameters.txt
··· 478 478 others). 479 479 480 480 ccw_timeout_log [S390] 481 - See Documentation/s390/CommonIO for details. 481 + See Documentation/s390/common_io.rst for details. 482 482 483 483 cgroup_disable= [KNL] Disable a particular controller 484 484 Format: {name of the controller(s) to disable} ··· 516 516 /selinux/checkreqprot. 517 517 518 518 cio_ignore= [S390] 519 - See Documentation/s390/CommonIO for details. 519 + See Documentation/s390/common_io.rst for details. 520 520 clk_ignore_unused 521 521 [CLK] 522 522 Prevents the clock framework from automatically gating
+2 -2
Documentation/driver-api/s390-drivers.rst
··· 27 27 although they are not the focus of this document. 28 28 29 29 Some additional information can also be found in the kernel source under 30 - Documentation/s390/driver-model.txt. 30 + Documentation/s390/driver-model.rst. 31 31 32 32 The css bus 33 33 =========== ··· 38 38 * Standard I/O subchannels, for use by the system. They have a child 39 39 device on the ccw bus and are described below. 40 40 * I/O subchannels bound to the vfio-ccw driver. See 41 - Documentation/s390/vfio-ccw.txt. 41 + Documentation/s390/vfio-ccw.rst. 42 42 * Message subchannels. No Linux driver currently exists. 43 43 * CHSC subchannels (at most one). The chsc subchannel driver can be used 44 44 to send asynchronous chsc commands.
+56 -29
Documentation/s390/3270.txt Documentation/s390/3270.rst
··· 1 + =============================== 1 2 IBM 3270 Display System support 3 + =============================== 2 4 3 5 This file describes the driver that supports local channel attachment 4 6 of IBM 3270 devices. It consists of three sections: 7 + 5 8 * Introduction 6 9 * Installation 7 10 * Operation 8 11 9 12 10 - INTRODUCTION. 13 + Introduction 14 + ============ 11 15 12 16 This paper describes installing and operating 3270 devices under 13 17 Linux/390. A 3270 device is a block-mode rows-and-columns terminal of ··· 21 17 You may have 3270s in-house and not know it. If you're using the 22 18 VM-ESA operating system, define a 3270 to your virtual machine by using 23 19 the command "DEF GRAF <hex-address>" This paper presumes you will be 24 - defining four 3270s with the CP/CMS commands 20 + defining four 3270s with the CP/CMS commands: 25 21 26 - DEF GRAF 620 27 - DEF GRAF 621 28 - DEF GRAF 622 29 - DEF GRAF 623 22 + - DEF GRAF 620 23 + - DEF GRAF 621 24 + - DEF GRAF 622 25 + - DEF GRAF 623 30 26 31 27 Your network connection from VM-ESA allows you to use x3270, tn3270, or 32 28 another 3270 emulator, started from an xterm window on your PC or ··· 38 34 dialed-in x3270. 39 35 40 36 41 - INSTALLATION. 37 + Installation 38 + ============ 42 39 43 40 You install the driver by installing a patch, doing a kernel build, and 44 41 running the configuration script (config3270.sh, in this directory). ··· 64 59 at boot time to a 3270 if it is a 3215. 65 60 66 61 In brief, these are the steps: 62 + 67 63 1. Install the tub3270 patch 68 - 2. (If a module) add a line to a file in /etc/modprobe.d/*.conf 64 + 2. (If a module) add a line to a file in `/etc/modprobe.d/*.conf` 69 65 3. (If VM) define devices with DEF GRAF 70 66 4. Reboot 71 67 5. Configure 72 68 73 69 To test that everything works, assuming VM and x3270, 70 + 74 71 1. Bring up an x3270 window. 75 72 2. Use the DIAL command in that window. 76 73 3. You should immediately see a Linux login screen. ··· 81 74 82 75 1. The 3270 driver is a part of the official Linux kernel 83 76 source. Build a tree with the kernel source and any necessary 84 - patches. Then do 77 + patches. Then do:: 78 + 85 79 make oldconfig 86 80 (If you wish to disable 3215 console support, edit 87 81 .config; change CONFIG_TN3215's value to "n"; ··· 92 84 make modules_install 93 85 94 86 2. (Perform this step only if you have configured tub3270 as a 95 - module.) Add a line to a file /etc/modprobe.d/*.conf to automatically 87 + module.) Add a line to a file `/etc/modprobe.d/*.conf` to automatically 96 88 load the driver when it's needed. With this line added, you will see 97 89 login prompts appear on your 3270s as soon as boot is complete (or 98 90 with emulated 3270s, as soon as you dial into your vm guest using the 99 91 command "DIAL <vmguestname>"). Since the line-mode major number is 100 - 227, the line to add should be: 92 + 227, the line to add should be:: 93 + 101 94 alias char-major-227 tub3270 102 95 103 96 3. Define graphic devices to your vm guest machine, if you 104 97 haven't already. Define them before you reboot (reipl): 105 - DEFINE GRAF 620 106 - DEFINE GRAF 621 107 - DEFINE GRAF 622 108 - DEFINE GRAF 623 98 + 99 + - DEFINE GRAF 620 100 + - DEFINE GRAF 621 101 + - DEFINE GRAF 622 102 + - DEFINE GRAF 623 109 103 110 104 4. Reboot. The reboot process scans hardware devices, including 111 105 3270s, and this enables the tub3270 driver once loaded to respond ··· 117 107 118 108 5. Run the 3270 configuration script config3270. It is 119 109 distributed in this same directory, Documentation/s390, as 120 - config3270.sh. Inspect the output script it produces, 110 + config3270.sh. Inspect the output script it produces, 121 111 /tmp/mkdev3270, and then run that script. This will create the 122 112 necessary character special device files and make the necessary 123 113 changes to /etc/inittab. 124 114 125 115 Then notify /sbin/init that /etc/inittab has changed, by issuing 126 - the telinit command with the q operand: 116 + the telinit command with the q operand:: 117 + 127 118 cd Documentation/s390 128 119 sh config3270.sh 129 120 sh /tmp/mkdev3270 130 121 telinit q 131 122 132 - This should be sufficient for your first time. If your 3270 123 + This should be sufficient for your first time. If your 3270 133 124 configuration has changed and you're reusing config3270, you 134 - should follow these steps: 125 + should follow these steps:: 126 + 135 127 Change 3270 configuration 136 128 Reboot 137 129 Run config3270 and /tmp/mkdev3270 ··· 144 132 1. Bring up an x3270 window, or use an actual hardware 3278 or 145 133 3279, or use the 3270 emulator of your choice. You would be 146 134 running the emulator on your PC or workstation. You would use 147 - the command, for example, 135 + the command, for example:: 136 + 148 137 x3270 vm-esa-domain-name & 138 + 149 139 if you wanted a 3278 Model 4 with 43 rows of 80 columns, the 150 140 default model number. The driver does not take advantage of 151 141 extended attributes. ··· 158 144 159 145 2. Use the DIAL command instead of the LOGIN command to connect 160 146 to one of the virtual 3270s you defined with the DEF GRAF 161 - commands: 147 + commands:: 148 + 162 149 dial my-vm-guest-name 163 150 164 151 3. You should immediately see a login prompt from your ··· 186 171 Wrong major number? Wrong minor number? There's your 187 172 problem! 188 173 189 - D. Do you get the message 174 + D. Do you get the message:: 175 + 190 176 "HCPDIA047E my-vm-guest-name 0620 does not exist"? 177 + 191 178 If so, you must issue the command "DEF GRAF 620" from your VM 192 179 3215 console and then reboot the system. 193 180 194 181 195 182 196 183 OPERATION. 184 + ========== 197 185 198 186 The driver defines three areas on the 3270 screen: the log area, the 199 187 input area, and the status area. ··· 221 203 Running" and nothing typed, the application receives a newline.) 222 204 223 205 You may change the scrolling timeout value. For example, the following 224 - command line: 206 + command line:: 207 + 225 208 echo scrolltime=60 > /proc/tty/driver/tty3270 209 + 226 210 changes the scrolling timeout value to 60 sec. Set scrolltime to 0 if 227 211 you wish to prevent scrolling entirely. 228 212 ··· 248 228 No PF key is preassigned to cause a job suspension, but you may cause a 249 229 job suspension by typing "^Z" and hitting ENTER. You may wish to 250 230 assign this function to a PF key. To make PF7 cause job suspension, 251 - execute the command: 231 + execute the command:: 232 + 252 233 echo pf7=^z > /proc/tty/driver/tty3270 253 234 254 235 If the input you type does not end with the two characters "^n", the ··· 264 243 invisible (such as for password entry) and it is not identical to the 265 244 current top entry. PF10 rotates backward through the command stack; 266 245 PF11 rotates forward. You may assign the backward function to any PF 267 - key (or PA key, for that matter), say, PA3, with the command: 246 + key (or PA key, for that matter), say, PA3, with the command:: 247 + 268 248 echo -e pa3=\\033k > /proc/tty/driver/tty3270 249 + 269 250 This assigns the string ESC-k to PA3. Similarly, the string ESC-j 270 251 performs the forward function. (Rationale: In bash with vi-mode line 271 252 editing, ESC-k and ESC-j retrieve backward and forward history. ··· 275 252 276 253 Is a stack size of twenty commands not to your liking? Change it on 277 254 the fly. To change to saving the last 100 commands, execute the 278 - command: 255 + command:: 256 + 279 257 echo recallsize=100 > /proc/tty/driver/tty3270 280 258 281 259 Have a command you issue frequently? Assign it to a PF or PA key! Use 282 - the command 283 - echo pf24="mkdir foobar; cd foobar" > /proc/tty/driver/tty3270 260 + the command:: 261 + 262 + echo pf24="mkdir foobar; cd foobar" > /proc/tty/driver/tty3270 263 + 284 264 to execute the commands mkdir foobar and cd foobar immediately when you 285 265 hit PF24. Want to see the command line first, before you execute it? 286 - Use the -n option of the echo command: 266 + Use the -n option of the echo command:: 267 + 287 268 echo -n pf24="mkdir foo; cd foo" > /proc/tty/driver/tty3270 288 269 289 270
+32 -17
Documentation/s390/CommonIO Documentation/s390/common_io.rst
··· 1 - S/390 common I/O-Layer - command line parameters, procfs and debugfs entries 2 - ============================================================================ 1 + ====================== 2 + S/390 common I/O-Layer 3 + ====================== 4 + 5 + command line parameters, procfs and debugfs entries 6 + =================================================== 3 7 4 8 Command line parameters 5 9 ----------------------- ··· 17 13 device := {all | [!]ipldev | [!]condev | [!]<devno> | [!]<devno>-<devno>} 18 14 19 15 The given devices will be ignored by the common I/O-layer; no detection 20 - and device sensing will be done on any of those devices. The subchannel to 16 + and device sensing will be done on any of those devices. The subchannel to 21 17 which the device in question is attached will be treated as if no device was 22 18 attached. 23 19 ··· 32 28 keywords can be used to refer to the CCW based boot device and CCW console 33 29 device respectively (these are probably useful only when combined with the '!' 34 30 operator). The '!' operator will cause the I/O-layer to _not_ ignore a device. 35 - The command line is parsed from left to right. 31 + The command line 32 + is parsed from left to right. 36 33 37 - For example, 34 + For example:: 35 + 38 36 cio_ignore=0.0.0023-0.0.0042,0.0.4711 37 + 39 38 will ignore all devices ranging from 0.0.0023 to 0.0.0042 and the device 40 39 0.0.4711, if detected. 41 - As another example, 40 + 41 + As another example:: 42 + 42 43 cio_ignore=all,!0.0.4711,!0.0.fd00-0.0.fd02 44 + 43 45 will ignore all devices but 0.0.4711, 0.0.fd00, 0.0.fd01, 0.0.fd02. 44 46 45 47 By default, no devices are ignored. ··· 58 48 59 49 Lists the ranges of devices (by bus id) which are ignored by common I/O. 60 50 61 - You can un-ignore certain or all devices by piping to /proc/cio_ignore. 62 - "free all" will un-ignore all ignored devices, 51 + You can un-ignore certain or all devices by piping to /proc/cio_ignore. 52 + "free all" will un-ignore all ignored devices, 63 53 "free <device range>, <device range>, ..." will un-ignore the specified 64 54 devices. 65 55 66 56 For example, if devices 0.0.0023 to 0.0.0042 and 0.0.4711 are ignored, 57 + 67 58 - echo free 0.0.0030-0.0.0032 > /proc/cio_ignore 68 59 will un-ignore devices 0.0.0030 to 0.0.0032 and will leave devices 0.0.0023 69 60 to 0.0.002f, 0.0.0033 to 0.0.0042 and 0.0.4711 ignored; 70 61 - echo free 0.0.0041 > /proc/cio_ignore will furthermore un-ignore device 71 62 0.0.0041; 72 - - echo free all > /proc/cio_ignore will un-ignore all remaining ignored 63 + - echo free all > /proc/cio_ignore will un-ignore all remaining ignored 73 64 devices. 74 65 75 - When a device is un-ignored, device recognition and sensing is performed and 66 + When a device is un-ignored, device recognition and sensing is performed and 76 67 the device driver will be notified if possible, so the device will become 77 68 available to the system. Note that un-ignoring is performed asynchronously. 78 69 79 - You can also add ranges of devices to be ignored by piping to 70 + You can also add ranges of devices to be ignored by piping to 80 71 /proc/cio_ignore; "add <device range>, <device range>, ..." will ignore the 81 72 specified devices. 82 73 83 74 Note: While already known devices can be added to the list of devices to be 84 - ignored, there will be no effect on then. However, if such a device 75 + ignored, there will be no effect on then. However, if such a device 85 76 disappears and then reappears, it will then be ignored. To make 86 77 known devices go away, you need the "purge" command (see below). 87 78 88 - For example, 79 + For example:: 80 + 89 81 "echo add 0.0.a000-0.0.accc, 0.0.af00-0.0.afff > /proc/cio_ignore" 82 + 90 83 will add 0.0.a000-0.0.accc and 0.0.af00-0.0.afff to the list of ignored 91 84 devices. 92 85 93 - You can remove already known but now ignored devices via 86 + You can remove already known but now ignored devices via:: 87 + 94 88 "echo purge > /proc/cio_ignore" 89 + 95 90 All devices ignored but still registered and not online (= not in use) 96 91 will be deregistered and thus removed from the system. 97 92 ··· 130 115 Various debug messages from the common I/O-layer. 131 116 132 117 - /sys/kernel/debug/s390dbf/cio_trace/hex_ascii 133 - Logs the calling of functions in the common I/O-layer and, if applicable, 118 + Logs the calling of functions in the common I/O-layer and, if applicable, 134 119 which subchannel they were called for, as well as dumps of some data 135 120 structures (like irb in an error case). 136 121 137 - The level of logging can be changed to be more or less verbose by piping to 122 + The level of logging can be changed to be more or less verbose by piping to 138 123 /sys/kernel/debug/s390dbf/cio_*/level a number between 0 and 6; see the 139 - documentation on the S/390 debug feature (Documentation/s390/s390dbf.txt) 124 + documentation on the S/390 debug feature (Documentation/s390/s390dbf.rst) 140 125 for details.
+22 -11
Documentation/s390/DASD Documentation/s390/dasd.rst
··· 1 + ================== 1 2 DASD device driver 3 + ================== 2 4 3 5 S/390's disk devices (DASDs) are managed by Linux via the DASD device 4 6 driver. It is valid for all types of DASDs and represents them to ··· 16 14 If you supply kernel parameters the different instances are processed 17 15 in order of appearance and a minor number is reserved for any device 18 16 covered by the supplied range up to 64 volumes. Additional DASDs are 19 - ignored. If you do not supply the 'dasd=' kernel parameter at all, the 17 + ignored. If you do not supply the 'dasd=' kernel parameter at all, the 20 18 DASD driver registers all supported DASDs of your system to a minor 21 19 number in ascending order of the subchannel number. 22 20 23 21 The driver currently supports ECKD-devices and there are stubs for 24 22 support of the FBA and CKD architectures. For the FBA architecture 25 23 only some smart data structures are missing to make the support 26 - complete. 24 + complete. 27 25 We performed our testing on 3380 and 3390 type disks of different 28 26 sizes, under VM and on the bare hardware (LPAR), using internal disks 29 27 of the multiprise as well as a RAMAC virtual array. Disks exported by ··· 36 34 provide support of partitions, maybe VTOC oriented or using a kind of 37 35 partition table in the label record. 38 36 39 - USAGE 37 + Usage 38 + ===== 40 39 41 40 -Low-level format (?CKD only) 42 41 For using an ECKD-DASD as a Linux harddisk you have to low-level 43 42 format the tracks by issuing the BLKDASDFORMAT-ioctl on that 44 43 device. This will erase any data on that volume including IBM volume 45 - labels, VTOCs etc. The ioctl may take a 'struct format_data *' or 46 - 'NULL' as an argument. 47 - typedef struct { 44 + labels, VTOCs etc. The ioctl may take a `struct format_data *` or 45 + 'NULL' as an argument:: 46 + 47 + typedef struct { 48 48 int start_unit; 49 49 int stop_unit; 50 50 int blksize; 51 - } format_data_t; 51 + } format_data_t; 52 + 52 53 When a NULL argument is passed to the BLKDASDFORMAT ioctl the whole 53 54 disk is formatted to a blocksize of 1024 bytes. Otherwise start_unit 54 55 and stop_unit are the first and last track to be formatted. If ··· 61 56 1kB blocks anyway and you gain approx. 50% of capacity increasing your 62 57 blksize from 512 byte to 1kB. 63 58 64 - -Make a filesystem 59 + Make a filesystem 60 + ================= 61 + 65 62 Then you can mk??fs the filesystem of your choice on that volume or 66 63 partition. For reasons of sanity you should build your filesystem on 67 - the partition /dev/dd?1 instead of the whole volume. You only lose 3kB 64 + the partition /dev/dd?1 instead of the whole volume. You only lose 3kB 68 65 but may be sure that you can reuse your data after introduction of a 69 66 real partition table. 70 67 71 - BUGS: 68 + Bugs 69 + ==== 70 + 72 71 - Performance sometimes is rather low because we don't fully exploit clustering 73 72 74 - TODO-List: 73 + TODO-List 74 + ========= 75 + 75 76 - Add IBM'S Disk layout to genhd 76 77 - Enhance driver to use more than one major number 77 78 - Enable usage as a module
+1416 -975
Documentation/s390/Debugging390.txt Documentation/s390/debugging390.rst
··· 1 + ============================================= 2 + Debugging on Linux for s/390 & z/Architecture 3 + ============================================= 1 4 2 - Debugging on Linux for s/390 & z/Architecture 3 - by 4 - Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com) 5 - Copyright (C) 2000-2001 IBM Deutschland Entwicklung GmbH, IBM Corporation 6 - Best viewed with fixed width fonts 5 + Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com) 6 + 7 + Copyright (C) 2000-2001 IBM Deutschland Entwicklung GmbH, IBM Corporation 8 + 9 + .. Best viewed with fixed width fonts 7 10 8 11 Overview of Document: 9 12 ===================== ··· 20 17 to be printed out & used as a quick cheat sheet self help style reference when 21 18 problems occur. 22 19 23 - Contents 24 - ======== 25 - Register Set 26 - Address Spaces on Intel Linux 27 - Address Spaces on Linux for s/390 & z/Architecture 28 - The Linux for s/390 & z/Architecture Kernel Task Structure 29 - Register Usage & Stackframes on Linux for s/390 & z/Architecture 30 - A sample program with comments 31 - Compiling programs for debugging on Linux for s/390 & z/Architecture 32 - Debugging under VM 33 - s/390 & z/Architecture IO Overview 34 - Debugging IO on s/390 & z/Architecture under VM 35 - GDB on s/390 & z/Architecture 36 - Stack chaining in gdb by hand 37 - Examining core dumps 38 - ldd 39 - Debugging modules 40 - The proc file system 41 - SysRq 42 - References 43 - Special Thanks 20 + .. Contents 21 + ======== 22 + Register Set 23 + Address Spaces on Intel Linux 24 + Address Spaces on Linux for s/390 & z/Architecture 25 + The Linux for s/390 & z/Architecture Kernel Task Structure 26 + Register Usage & Stackframes on Linux for s/390 & z/Architecture 27 + A sample program with comments 28 + Compiling programs for debugging on Linux for s/390 & z/Architecture 29 + Debugging under VM 30 + s/390 & z/Architecture IO Overview 31 + Debugging IO on s/390 & z/Architecture under VM 32 + GDB on s/390 & z/Architecture 33 + Stack chaining in gdb by hand 34 + Examining core dumps 35 + ldd 36 + Debugging modules 37 + The proc file system 38 + SysRq 39 + References 40 + Special Thanks 44 41 45 42 Register Set 46 43 ============ 47 44 The current architectures have the following registers. 48 - 45 + 49 46 16 General propose registers, 32 bit on s/390 and 64 bit on z/Architecture, 50 47 r0-r15 (or gpr0-gpr15), used for arithmetic and addressing. 51 48 ··· 62 59 64 bit pointer) is currently used by the pthread library as a pointer to 63 60 the current running threads private area. 64 61 65 - 16 64 bit floating point registers (fp0-fp15 ) IEEE & HFP floating 66 - point format compliant on G5 upwards & a Floating point control reg (FPC) 67 - 4 64 bit registers (fp0,fp2,fp4 & fp6) HFP only on older machines. 62 + 16 64-bit floating point registers (fp0-fp15 ) IEEE & HFP floating 63 + point format compliant on G5 upwards & a Floating point control reg (FPC) 64 + 65 + 4 64-bit registers (fp0,fp2,fp4 & fp6) HFP only on older machines. 66 + 68 67 Note: 69 - Linux (currently) always uses IEEE & emulates G5 IEEE format on older machines, 70 - ( provided the kernel is configured for this ). 68 + Linux (currently) always uses IEEE & emulates G5 IEEE format on older 69 + machines, ( provided the kernel is configured for this ). 71 70 72 71 73 72 The PSW is the most important register on the machine it 74 - is 64 bit on s/390 & 128 bit on z/Architecture & serves the roles of 73 + is 64 bit on s/390 & 128 bit on z/Architecture & serves the roles of 75 74 a program counter (pc), condition code register,memory space designator. 76 75 In IBM standard notation I am counting bit 0 as the MSB. 77 76 It has several advantages over a normal program counter 78 - in that you can change address translation & program counter 77 + in that you can change address translation & program counter 79 78 in a single instruction. To change address translation, 80 79 e.g. switching address translation off requires that you 81 80 have a logical=physical mapping for the address you are ··· 211 206 z/Architecture and is exchanged with one page on s/390 or two pages on 212 207 z/Architecture in absolute storage by the set prefix instruction during Linux 213 208 startup. 209 + 214 210 This page is mapped to a different prefix for each processor in an SMP 215 211 configuration (assuming the OS designer is sane of course). 212 + 216 213 Bytes 0-512 (200 hex) on s/390 and 0-512, 4096-4544, 4604-5119 currently on 217 214 z/Architecture are used by the processor itself for holding such information 218 215 as exception indications and entry points for exceptions. 216 + 219 217 Bytes after 0xc00 hex are used by linux for per processor globals on s/390 and 220 218 z/Architecture (there is a gap on z/Architecture currently between 0xc00 and 221 219 0x1000, too, which is used by Linux). 220 + 222 221 The closest thing to this on traditional architectures is the interrupt 223 222 vector table. This is a good thing & does simplify some of the kernel coding 224 223 however it means that we now cannot catch stray NULL pointers in the ··· 234 225 ============================= 235 226 236 227 The traditional Intel Linux is approximately mapped as follows forgive 237 - the ascii art. 238 - 0xFFFFFFFF 4GB Himem ***************** 239 - * * 240 - * Kernel Space * 241 - * * 242 - ***************** **************** 243 - User Space Himem * User Stack * * * 244 - (typically 0xC0000000 3GB ) ***************** * * 245 - * Shared Libs * * Next Process * 246 - ***************** * to * 247 - * * <== * Run * <== 248 - * User Program * * * 249 - * Data BSS * * * 250 - * Text * * * 251 - * Sections * * * 252 - 0x00000000 ***************** **************** 228 + the ascii art:: 229 + 230 + 0xFFFFFFFF 4GB Himem ***************** 231 + * * 232 + * Kernel Space * 233 + * * 234 + ***************** **************** 235 + User Space Himem * User Stack * * * 236 + (typically 0xC0000000 3GB ) ***************** * * 237 + * Shared Libs * * Next Process * 238 + ***************** * to * 239 + * * <== * Run * <== 240 + * User Program * * * 241 + * Data BSS * * * 242 + * Text * * * 243 + * Sections * * * 244 + 0x00000000 ***************** **************** 253 245 254 246 Now it is easy to see that on Intel it is quite easy to recognise a kernel 255 247 address as being one greater than user space himem (in this case 0xC0000000), 256 248 and addresses of less than this are the ones in the current running program on 257 249 this processor (if an smp box). 250 + 258 251 If using the virtual machine ( VM ) as a debugger it is quite difficult to 259 252 know which user process is running as the address space you are looking at 260 253 could be from any process in the run queue. ··· 267 256 This means that on Intel the kernel linux can typically only address 268 257 Himem=0xFFFFFFFF-0xC0000000=1GB & this is all the RAM these machines 269 258 can typically use. 259 + 270 260 They can lower User Himem to 2GB or lower & thus be 271 261 able to use 2GB of RAM however this shrinks the maximum size 272 262 of User Space from 3GB to 2GB they have a no win limit of 4GB unless ··· 276 264 277 265 On 390 our limitations & strengths make us slightly different. 278 266 For backward compatibility we are only allowed use 31 bits (2GB) 279 - of our 32 bit addresses, however, we use entirely separate address 267 + of our 32 bit addresses, however, we use entirely separate address 280 268 spaces for the user & kernel. 281 269 282 270 This means we can support 2GB of non Extended RAM on s/390, & more 283 - with the Extended memory management swap device & 271 + with the Extended memory management swap device & 284 272 currently 4TB of physical memory currently on z/Architecture. 285 273 286 274 287 275 Address Spaces on Linux for s/390 & z/Architecture 288 276 ================================================== 289 277 290 - Our addressing scheme is basically as follows: 278 + Our addressing scheme is basically as follows:: 291 279 292 - Primary Space Home Space 293 - Himem 0x7fffffff 2GB on s/390 ***************** **************** 294 - currently 0x3ffffffffff (2^42)-1 * User Stack * * * 295 - on z/Architecture. ***************** * * 296 - * Shared Libs * * * 297 - ***************** * * 298 - * * * Kernel * 299 - * User Program * * * 300 - * Data BSS * * * 301 - * Text * * * 302 - * Sections * * * 303 - 0x00000000 ***************** **************** 280 + Primary Space Home Space 281 + Himem 0x7fffffff 2GB on s/390 ***************** **************** 282 + currently 0x3ffffffffff (2^42)-1 * User Stack * * * 283 + on z/Architecture. ***************** * * 284 + * Shared Libs * * * 285 + ***************** * * 286 + * * * Kernel * 287 + * User Program * * * 288 + * Data BSS * * * 289 + * Text * * * 290 + * Sections * * * 291 + 0x00000000 ***************** **************** 304 292 305 293 This also means that we need to look at the PSW problem state bit and the 306 294 addressing mode to decide whether we are looking at user or kernel space. ··· 316 304 When also looking at the ASCE control registers, this means: 317 305 318 306 User space: 307 + 319 308 - runs in primary or access register mode 320 309 - cr1 contains the user asce 321 310 - cr7 contains the user asce 322 311 - cr13 contains the kernel asce 323 312 324 313 Kernel space: 314 + 325 315 - runs in home space mode 326 316 - cr1 contains the user or kernel asce 327 - -> the kernel asce is loaded when a uaccess requires primary or 328 - secondary address mode 317 + 318 + - the kernel asce is loaded when a uaccess requires primary or 319 + secondary address mode 320 + 329 321 - cr7 contains the user or kernel asce, (changed with set_fs()) 330 322 - cr13 contains the kernel asce 331 323 332 324 In case of uaccess the kernel changes to: 325 + 333 326 - primary space mode in case of a uaccess (copy_to_user) and uses 334 327 e.g. the mvcp instruction to access user space. However the kernel 335 328 will stay in home space mode if the mvcos instruction is available ··· 354 337 A virtual address on s/390 is made up of 3 parts 355 338 The SX (segment index, roughly corresponding to the PGD & PMD in Linux 356 339 terminology) being bits 1-11. 340 + 357 341 The PX (page index, corresponding to the page table entry (pte) in Linux 358 342 terminology) being bits 12-19. 343 + 359 344 The remaining bits BX (the byte index are the offset in the page ) 360 345 i.e. bits 20 to 31. 361 346 362 347 On z/Architecture in linux we currently make up an address from 4 parts. 363 - The region index bits (RX) 0-32 we currently use bits 22-32 364 - The segment index (SX) being bits 33-43 365 - The page index (PX) being bits 44-51 366 - The byte index (BX) being bits 52-63 348 + 349 + - The region index bits (RX) 0-32 we currently use bits 22-32 350 + - The segment index (SX) being bits 33-43 351 + - The page index (PX) being bits 44-51 352 + - The byte index (BX) being bits 52-63 367 353 368 354 Notes: 369 - 1) s/390 has no PMD so the PMD is really the PGD also. 370 - A lot of this stuff is defined in pgtable.h. 355 + 1) s/390 has no PMD so the PMD is really the PGD also. 356 + A lot of this stuff is defined in pgtable.h. 371 357 372 - 2) Also seeing as s/390's page indexes are only 1k in size 373 - (bits 12-19 x 4 bytes per pte ) we use 1 ( page 4k ) 374 - to make the best use of memory by updating 4 segment indices 375 - entries each time we mess with a PMD & use offsets 376 - 0,1024,2048 & 3072 in this page as for our segment indexes. 377 - On z/Architecture our page indexes are now 2k in size 378 - ( bits 12-19 x 8 bytes per pte ) we do a similar trick 379 - but only mess with 2 segment indices each time we mess with 380 - a PMD. 358 + 2) Also seeing as s/390's page indexes are only 1k in size 359 + (bits 12-19 x 4 bytes per pte ) we use 1 ( page 4k ) 360 + to make the best use of memory by updating 4 segment indices 361 + entries each time we mess with a PMD & use offsets 362 + 0,1024,2048 & 3072 in this page as for our segment indexes. 363 + On z/Architecture our page indexes are now 2k in size 364 + ( bits 12-19 x 8 bytes per pte ) we do a similar trick 365 + but only mess with 2 segment indices each time we mess with 366 + a PMD. 381 367 382 - 3) As z/Architecture supports up to a massive 5-level page table lookup we 383 - can only use 3 currently on Linux ( as this is all the generic kernel 384 - currently supports ) however this may change in future 385 - this allows us to access ( according to my sums ) 386 - 4TB of virtual storage per process i.e. 387 - 4096*512(PTES)*1024(PMDS)*2048(PGD) = 4398046511104 bytes, 388 - enough for another 2 or 3 of years I think :-). 389 - to do this we use a region-third-table designation type in 390 - our address space control registers. 391 - 368 + 3) As z/Architecture supports up to a massive 5-level page table lookup we 369 + can only use 3 currently on Linux ( as this is all the generic kernel 370 + currently supports ) however this may change in future 371 + this allows us to access ( according to my sums ) 372 + 4TB of virtual storage per process i.e. 373 + 4096*512(PTES)*1024(PMDS)*2048(PGD) = 4398046511104 bytes, 374 + enough for another 2 or 3 of years I think :-). 375 + to do this we use a region-third-table designation type in 376 + our address space control registers. 377 + 392 378 393 379 The Linux for s/390 & z/Architecture Kernel Task Structure 394 380 ========================================================== ··· 402 382 (which we use for per-processor globals). 403 383 404 384 The kernel stack pointer is intimately tied with the task structure for 405 - each processor as follows. 385 + each processor as follows:: 406 386 407 - s/390 408 - ************************ 409 - * 1 page kernel stack * 410 - * ( 4K ) * 411 - ************************ 412 - * 1 page task_struct * 413 - * ( 4K ) * 414 - 8K aligned ************************ 387 + s/390 388 + ************************ 389 + * 1 page kernel stack * 390 + * ( 4K ) * 391 + ************************ 392 + * 1 page task_struct * 393 + * ( 4K ) * 394 + 8K aligned ************************ 415 395 416 - z/Architecture 417 - ************************ 418 - * 2 page kernel stack * 419 - * ( 8K ) * 420 - ************************ 421 - * 2 page task_struct * 422 - * ( 8K ) * 423 - 16K aligned ************************ 396 + z/Architecture 397 + ************************ 398 + * 2 page kernel stack * 399 + * ( 8K ) * 400 + ************************ 401 + * 2 page task_struct * 402 + * ( 8K ) * 403 + 16K aligned ************************ 424 404 425 405 What this means is that we don't need to dedicate any register or global 426 406 variable to point to the current running process & can retrieve it with the 427 - following very simple construct for s/390 & one very similar for z/Architecture. 407 + following very simple construct for s/390 & one very similar for 408 + z/Architecture:: 428 409 429 - static inline struct task_struct * get_current(void) 430 - { 431 - struct task_struct *current; 432 - __asm__("lhi %0,-8192\n\t" 433 - "nr %0,15" 434 - : "=r" (current) ); 435 - return current; 436 - } 410 + static inline struct task_struct * get_current(void) 411 + { 412 + struct task_struct *current; 413 + __asm__("lhi %0,-8192\n\t" 414 + "nr %0,15" 415 + : "=r" (current) ); 416 + return current; 417 + } 437 418 438 419 i.e. just anding the current kernel stack pointer with the mask -8192. 439 420 Thankfully because Linux doesn't have support for nested IO interrupts 440 - & our devices have large buffers can survive interrupts being shut for 421 + & our devices have large buffers can survive interrupts being shut for 441 422 short amounts of time we don't need a separate stack for interrupts. 442 423 443 424 ··· 449 428 Overview: 450 429 --------- 451 430 This is the code that gcc produces at the top & the bottom of 452 - each function. It usually is fairly consistent & similar from 431 + each function. It usually is fairly consistent & similar from 453 432 function to function & if you know its layout you can probably 454 433 make some headway in finding the ultimate cause of a problem 455 434 after a crash without a source level debugger. ··· 464 443 Glossary: 465 444 --------- 466 445 alloca: 467 - This is a built in compiler function for runtime allocation 468 - of extra space on the callers stack which is obviously freed 469 - up on function exit ( e.g. the caller may choose to allocate nothing 470 - of a buffer of 4k if required for temporary purposes ), it generates 471 - very efficient code ( a few cycles ) when compared to alternatives 472 - like malloc. 446 + This is a built in compiler function for runtime allocation 447 + of extra space on the callers stack which is obviously freed 448 + up on function exit ( e.g. the caller may choose to allocate nothing 449 + of a buffer of 4k if required for temporary purposes ), it generates 450 + very efficient code ( a few cycles ) when compared to alternatives 451 + like malloc. 473 452 474 - automatics: These are local variables on the stack, 475 - i.e they aren't in registers & they aren't static. 453 + automatics: 454 + These are local variables on the stack, i.e they aren't in registers & 455 + they aren't static. 476 456 477 457 back-chain: 478 - This is a pointer to the stack pointer before entering a 479 - framed functions ( see frameless function ) prologue got by 480 - dereferencing the address of the current stack pointer, 481 - i.e. got by accessing the 32 bit value at the stack pointers 482 - current location. 458 + This is a pointer to the stack pointer before entering a 459 + framed functions ( see frameless function ) prologue got by 460 + dereferencing the address of the current stack pointer, 461 + i.e. got by accessing the 32 bit value at the stack pointers 462 + current location. 483 463 484 464 base-pointer: 485 - This is a pointer to the back of the literal pool which 486 - is an area just behind each procedure used to store constants 487 - in each function. 465 + This is a pointer to the back of the literal pool which 466 + is an area just behind each procedure used to store constants 467 + in each function. 488 468 489 - call-clobbered: The caller probably needs to save these registers if there 490 - is something of value in them, on the stack or elsewhere before making a 491 - call to another procedure so that it can restore it later. 469 + call-clobbered: 470 + The caller probably needs to save these registers if there 471 + is something of value in them, on the stack or elsewhere before making a 472 + call to another procedure so that it can restore it later. 492 473 493 474 epilogue: 494 - The code generated by the compiler to return to the caller. 475 + The code generated by the compiler to return to the caller. 495 476 496 - frameless-function 497 - A frameless function in Linux for s390 & z/Architecture is one which doesn't 498 - need more than the register save area (96 bytes on s/390, 160 on z/Architecture) 499 - given to it by the caller. 500 - A frameless function never: 501 - 1) Sets up a back chain. 502 - 2) Calls alloca. 503 - 3) Calls other normal functions 504 - 4) Has automatics. 477 + frameless-function: 478 + A frameless function in Linux for s390 & z/Architecture is one which doesn't 479 + need more than the register save area (96 bytes on s/390, 160 on z/Architecture) 480 + given to it by the caller. 481 + 482 + A frameless function never: 483 + 484 + 1) Sets up a back chain. 485 + 2) Calls alloca. 486 + 3) Calls other normal functions 487 + 4) Has automatics. 505 488 506 489 GOT-pointer: 507 - This is a pointer to the global-offset-table in ELF 508 - ( Executable Linkable Format, Linux'es most common executable format ), 509 - all globals & shared library objects are found using this pointer. 490 + This is a pointer to the global-offset-table in ELF 491 + ( Executable Linkable Format, Linux'es most common executable format ), 492 + all globals & shared library objects are found using this pointer. 510 493 511 494 lazy-binding 512 - ELF shared libraries are typically only loaded when routines in the shared 513 - library are actually first called at runtime. This is lazy binding. 495 + ELF shared libraries are typically only loaded when routines in the shared 496 + library are actually first called at runtime. This is lazy binding. 514 497 515 498 procedure-linkage-table 516 - This is a table found from the GOT which contains pointers to routines 517 - in other shared libraries which can't be called to by easier means. 499 + This is a table found from the GOT which contains pointers to routines 500 + in other shared libraries which can't be called to by easier means. 518 501 519 502 prologue: 520 - The code generated by the compiler to set up the stack frame. 503 + The code generated by the compiler to set up the stack frame. 521 504 522 505 outgoing-args: 523 - This is extra area allocated on the stack of the calling function if the 524 - parameters for the callee's cannot all be put in registers, the same 525 - area can be reused by each function the caller calls. 506 + This is extra area allocated on the stack of the calling function if the 507 + parameters for the callee's cannot all be put in registers, the same 508 + area can be reused by each function the caller calls. 526 509 527 510 routine-descriptor: 528 - A COFF executable format based concept of a procedure reference 529 - actually being 8 bytes or more as opposed to a simple pointer to the routine. 530 - This is typically defined as follows 531 - Routine Descriptor offset 0=Pointer to Function 532 - Routine Descriptor offset 4=Pointer to Table of Contents 533 - The table of contents/TOC is roughly equivalent to a GOT pointer. 534 - & it means that shared libraries etc. can be shared between several 535 - environments each with their own TOC. 511 + A COFF executable format based concept of a procedure reference 512 + actually being 8 bytes or more as opposed to a simple pointer to the routine. 513 + This is typically defined as follows: 536 514 537 - 538 - static-chain: This is used in nested functions a concept adopted from pascal 539 - by gcc not used in ansi C or C++ ( although quite useful ), basically it 540 - is a pointer used to reference local variables of enclosing functions. 541 - You might come across this stuff once or twice in your lifetime. 515 + - Routine Descriptor offset 0=Pointer to Function 516 + - Routine Descriptor offset 4=Pointer to Table of Contents 542 517 543 - e.g. 544 - The function below should return 11 though gcc may get upset & toss warnings 545 - about unused variables. 546 - int FunctionA(int a) 547 - { 518 + The table of contents/TOC is roughly equivalent to a GOT pointer. 519 + & it means that shared libraries etc. can be shared between several 520 + environments each with their own TOC. 521 + 522 + static-chain: 523 + This is used in nested functions a concept adopted from pascal 524 + by gcc not used in ansi C or C++ ( although quite useful ), basically it 525 + is a pointer used to reference local variables of enclosing functions. 526 + You might come across this stuff once or twice in your lifetime. 527 + 528 + e.g. 529 + 530 + The function below should return 11 though gcc may get upset & toss warnings 531 + about unused variables:: 532 + 533 + int FunctionA(int a) 534 + { 548 535 int b; 549 536 FunctionC(int c) 550 537 { ··· 560 531 } 561 532 FunctionC(10); 562 533 return(b); 563 - } 534 + } 564 535 565 536 566 537 s/390 & z/Architecture Register usage 567 538 ===================================== 539 + 540 + ======== ========================================== =============== 568 541 r0 used by syscalls/assembly call-clobbered 569 - r1 used by syscalls/assembly call-clobbered 542 + r1 used by syscalls/assembly call-clobbered 570 543 r2 argument 0 / return value 0 call-clobbered 571 544 r3 argument 1 / return value 1 (if long long) call-clobbered 572 545 r4 argument 2 call-clobbered 573 546 r5 argument 3 call-clobbered 574 - r6 argument 4 saved 575 - r7 pointer-to arguments 5 to ... saved 547 + r6 argument 4 saved 548 + r7 pointer-to arguments 5 to ... saved 576 549 r8 this & that saved 577 550 r9 this & that saved 578 551 r10 static-chain ( if nested function ) saved ··· 588 557 f2 argument 1 call-clobbered 589 558 f4 z/Architecture argument 2 saved 590 559 f6 z/Architecture argument 3 saved 560 + ======== ========================================== =============== 561 + 591 562 The remaining floating points 592 563 f1,f3,f5 f7-f15 are call-clobbered. 593 564 594 565 Notes: 595 566 ------ 596 567 1) The only requirement is that registers which are used 597 - by the callee are saved, e.g. the compiler is perfectly 598 - capable of using r11 for purposes other than a frame a 599 - frame pointer if a frame pointer is not needed. 600 - 2) In functions with variable arguments e.g. printf the calling procedure 601 - is identical to one without variable arguments & the same number of 602 - parameters. However, the prologue of this function is somewhat more 603 - hairy owing to it having to move these parameters to the stack to 604 - get va_start, va_arg & va_end to work. 568 + by the callee are saved, e.g. the compiler is perfectly 569 + capable of using r11 for purposes other than a frame a 570 + frame pointer if a frame pointer is not needed. 571 + 2) In functions with variable arguments e.g. printf the calling procedure 572 + is identical to one without variable arguments & the same number of 573 + parameters. However, the prologue of this function is somewhat more 574 + hairy owing to it having to move these parameters to the stack to 575 + get va_start, va_arg & va_end to work. 605 576 3) Access registers are currently unused by gcc but are used in 606 - the kernel. Possibilities exist to use them at the moment for 607 - temporary storage but it isn't recommended. 577 + the kernel. Possibilities exist to use them at the moment for 578 + temporary storage but it isn't recommended. 608 579 4) Only 4 of the floating point registers are used for 609 - parameter passing as older machines such as G3 only have only 4 610 - & it keeps the stack frame compatible with other compilers. 611 - However with IEEE floating point emulation under linux on the 612 - older machines you are free to use the other 12. 613 - 5) A long long or double parameter cannot be have the 614 - first 4 bytes in a register & the second four bytes in the 615 - outgoing args area. It must be purely in the outgoing args 616 - area if crossing this boundary. 580 + parameter passing as older machines such as G3 only have only 4 581 + & it keeps the stack frame compatible with other compilers. 582 + However with IEEE floating point emulation under linux on the 583 + older machines you are free to use the other 12. 584 + 5) A long long or double parameter cannot be have the 585 + first 4 bytes in a register & the second four bytes in the 586 + outgoing args area. It must be purely in the outgoing args 587 + area if crossing this boundary. 617 588 6) Floating point parameters are mixed with outgoing args 618 - on the outgoing args area in the order the are passed in as parameters. 619 - 7) Floating point arguments 2 & 3 are saved in the outgoing args area for 620 - z/Architecture 589 + on the outgoing args area in the order the are passed in as parameters. 590 + 7) Floating point arguments 2 & 3 are saved in the outgoing args area for 591 + z/Architecture 621 592 622 593 623 594 Stack Frame Layout 624 595 ------------------ 596 + 597 + ========= ============== ====================================================== 625 598 s/390 z/Architecture 626 - 0 0 back chain ( a 0 here signifies end of back chain ) 627 - 4 8 eos ( end of stack, not used on Linux for S390 used in other linkage formats ) 628 - 8 16 glue used in other s/390 linkage formats for saved routine descriptors etc. 629 - 12 24 glue used in other s/390 linkage formats for saved routine descriptors etc. 630 - 16 32 scratch area 631 - 20 40 scratch area 632 - 24 48 saved r6 of caller function 633 - 28 56 saved r7 of caller function 634 - 32 64 saved r8 of caller function 635 - 36 72 saved r9 of caller function 636 - 40 80 saved r10 of caller function 637 - 44 88 saved r11 of caller function 638 - 48 96 saved r12 of caller function 639 - 52 104 saved r13 of caller function 640 - 56 112 saved r14 of caller function 641 - 60 120 saved r15 of caller function 642 - 64 128 saved f4 of caller function 643 - 72 132 saved f6 of caller function 644 - 80 undefined 645 - 96 160 outgoing args passed from caller to callee 646 - 96+x 160+x possible stack alignment ( 8 bytes desirable ) 647 - 96+x+y 160+x+y alloca space of caller ( if used ) 648 - 96+x+y+z 160+x+y+z automatics of caller ( if used ) 649 - 0 back-chain 599 + ========= ============== ====================================================== 600 + 0 0 back chain ( a 0 here signifies end of back chain ) 601 + 4 8 eos ( end of stack, not used on Linux for S390 used 602 + in other linkage formats ) 603 + 8 16 glue used in other s/390 linkage formats for saved 604 + routine descriptors etc. 605 + 12 24 glue used in other s/390 linkage formats for saved 606 + routine descriptors etc. 607 + 16 32 scratch area 608 + 20 40 scratch area 609 + 24 48 saved r6 of caller function 610 + 28 56 saved r7 of caller function 611 + 32 64 saved r8 of caller function 612 + 36 72 saved r9 of caller function 613 + 40 80 saved r10 of caller function 614 + 44 88 saved r11 of caller function 615 + 48 96 saved r12 of caller function 616 + 52 104 saved r13 of caller function 617 + 56 112 saved r14 of caller function 618 + 60 120 saved r15 of caller function 619 + 64 128 saved f4 of caller function 620 + 72 132 saved f6 of caller function 621 + 80 undefined 622 + 96 160 outgoing args passed from caller to callee 623 + 96+x 160+x possible stack alignment ( 8 bytes desirable ) 624 + 96+x+y 160+x+y alloca space of caller ( if used ) 625 + 96+x+y+z 160+x+y+z automatics of caller ( if used ) 626 + 0 back-chain 627 + ========= ============== ====================================================== 650 628 651 629 A sample program with comments. 652 630 =============================== ··· 663 623 Comments on the function test 664 624 ----------------------------- 665 625 1) It didn't need to set up a pointer to the constant pool gpr13 as it is not 666 - used ( :-( ). 626 + used ( :-( ). 667 627 2) This is a frameless function & no stack is bought. 668 628 3) The compiler was clever enough to recognise that it could return the 669 - value in r2 as well as use it for the passed in parameter ( :-) ). 670 - 4) The basr ( branch relative & save ) trick works as follows the instruction 671 - has a special case with r0,r0 with some instruction operands is understood as 672 - the literal value 0, some risc architectures also do this ). So now 673 - we are branching to the next address & the address new program counter is 674 - in r13,so now we subtract the size of the function prologue we have executed 675 - + the size of the literal pool to get to the top of the literal pool 676 - 0040037c int test(int b) 677 - { # Function prologue below 678 - 40037c: 90 de f0 34 stm %r13,%r14,52(%r15) # Save registers r13 & r14 679 - 400380: 0d d0 basr %r13,%r0 # Set up pointer to constant pool using 680 - 400382: a7 da ff fa ahi %r13,-6 # basr trick 681 - return(5+b); 682 - # Huge main program 683 - 400386: a7 2a 00 05 ahi %r2,5 # add 5 to r2 629 + value in r2 as well as use it for the passed in parameter ( :-) ). 630 + 4) The basr ( branch relative & save ) trick works as follows the instruction 631 + has a special case with r0,r0 with some instruction operands is understood as 632 + the literal value 0, some risc architectures also do this ). So now 633 + we are branching to the next address & the address new program counter is 634 + in r13,so now we subtract the size of the function prologue we have executed 635 + the size of the literal pool to get to the top of the literal pool:: 684 636 685 - # Function epilogue below 686 - 40038a: 98 de f0 34 lm %r13,%r14,52(%r15) # restore registers r13 & 14 687 - 40038e: 07 fe br %r14 # return 688 - } 637 + 638 + 0040037c int test(int b) 639 + { # Function prologue below 640 + 40037c: 90 de f0 34 stm %r13,%r14,52(%r15) # Save registers r13 & r14 641 + 400380: 0d d0 basr %r13,%r0 # Set up pointer to constant pool using 642 + 400382: a7 da ff fa ahi %r13,-6 # basr trick 643 + return(5+b); 644 + # Huge main program 645 + 400386: a7 2a 00 05 ahi %r2,5 # add 5 to r2 646 + 647 + # Function epilogue below 648 + 40038a: 98 de f0 34 lm %r13,%r14,52(%r15) # restore registers r13 & 14 649 + 40038e: 07 fe br %r14 # return 650 + } 689 651 690 652 Comments on the function main 691 653 ----------------------------- 692 - 1) The compiler did this function optimally ( 8-) ) 654 + 1) The compiler did this function optimally ( 8-) ):: 693 655 694 - Literal pool for main. 695 - 400390: ff ff ff ec .long 0xffffffec 696 - main(int argc,char *argv[]) 697 - { # Function prologue below 698 - 400394: 90 bf f0 2c stm %r11,%r15,44(%r15) # Save necessary registers 699 - 400398: 18 0f lr %r0,%r15 # copy stack pointer to r0 700 - 40039a: a7 fa ff a0 ahi %r15,-96 # Make area for callee saving 701 - 40039e: 0d d0 basr %r13,%r0 # Set up r13 to point to 702 - 4003a0: a7 da ff f0 ahi %r13,-16 # literal pool 703 - 4003a4: 50 00 f0 00 st %r0,0(%r15) # Save backchain 656 + Literal pool for main. 657 + 400390: ff ff ff ec .long 0xffffffec 658 + main(int argc,char *argv[]) 659 + { # Function prologue below 660 + 400394: 90 bf f0 2c stm %r11,%r15,44(%r15) # Save necessary registers 661 + 400398: 18 0f lr %r0,%r15 # copy stack pointer to r0 662 + 40039a: a7 fa ff a0 ahi %r15,-96 # Make area for callee saving 663 + 40039e: 0d d0 basr %r13,%r0 # Set up r13 to point to 664 + 4003a0: a7 da ff f0 ahi %r13,-16 # literal pool 665 + 4003a4: 50 00 f0 00 st %r0,0(%r15) # Save backchain 704 666 705 667 return(test(5)); # Main Program Below 706 - 4003a8: 58 e0 d0 00 l %r14,0(%r13) # load relative address of test from 707 - # literal pool 708 - 4003ac: a7 28 00 05 lhi %r2,5 # Set first parameter to 5 709 - 4003b0: 4d ee d0 00 bas %r14,0(%r14,%r13) # jump to test setting r14 as return 668 + 4003a8: 58 e0 d0 00 l %r14,0(%r13) # load relative address of test from 669 + # literal pool 670 + 4003ac: a7 28 00 05 lhi %r2,5 # Set first parameter to 5 671 + 4003b0: 4d ee d0 00 bas %r14,0(%r14,%r13) # jump to test setting r14 as return 710 672 # address using branch & save instruction. 711 673 712 674 # Function Epilogue below 713 - 4003b4: 98 bf f0 8c lm %r11,%r15,140(%r15)# Restore necessary registers. 714 - 4003b8: 07 fe br %r14 # return to do program exit 715 - } 675 + 4003b4: 98 bf f0 8c lm %r11,%r15,140(%r15)# Restore necessary registers. 676 + 4003b8: 07 fe br %r14 # return to do program exit 677 + } 716 678 717 679 718 680 Compiler updates 719 681 ---------------- 720 682 721 - main(int argc,char *argv[]) 722 - { 723 - 4004fc: 90 7f f0 1c stm %r7,%r15,28(%r15) 724 - 400500: a7 d5 00 04 bras %r13,400508 <main+0xc> 725 - 400504: 00 40 04 f4 .long 0x004004f4 726 - # compiler now puts constant pool in code to so it saves an instruction 727 - 400508: 18 0f lr %r0,%r15 728 - 40050a: a7 fa ff a0 ahi %r15,-96 729 - 40050e: 50 00 f0 00 st %r0,0(%r15) 683 + :: 684 + 685 + main(int argc,char *argv[]) 686 + { 687 + 4004fc: 90 7f f0 1c stm %r7,%r15,28(%r15) 688 + 400500: a7 d5 00 04 bras %r13,400508 <main+0xc> 689 + 400504: 00 40 04 f4 .long 0x004004f4 690 + # compiler now puts constant pool in code to so it saves an instruction 691 + 400508: 18 0f lr %r0,%r15 692 + 40050a: a7 fa ff a0 ahi %r15,-96 693 + 40050e: 50 00 f0 00 st %r0,0(%r15) 730 694 return(test(5)); 731 - 400512: 58 10 d0 00 l %r1,0(%r13) 732 - 400516: a7 28 00 05 lhi %r2,5 733 - 40051a: 0d e1 basr %r14,%r1 734 - # compiler adds 1 extra instruction to epilogue this is done to 735 - # avoid processor pipeline stalls owing to data dependencies on g5 & 736 - # above as register 14 in the old code was needed directly after being loaded 737 - # by the lm %r11,%r15,140(%r15) for the br %14. 738 - 40051c: 58 40 f0 98 l %r4,152(%r15) 739 - 400520: 98 7f f0 7c lm %r7,%r15,124(%r15) 740 - 400524: 07 f4 br %r4 741 - } 695 + 400512: 58 10 d0 00 l %r1,0(%r13) 696 + 400516: a7 28 00 05 lhi %r2,5 697 + 40051a: 0d e1 basr %r14,%r1 698 + # compiler adds 1 extra instruction to epilogue this is done to 699 + # avoid processor pipeline stalls owing to data dependencies on g5 & 700 + # above as register 14 in the old code was needed directly after being loaded 701 + # by the lm %r11,%r15,140(%r15) for the br %14. 702 + 40051c: 58 40 f0 98 l %r4,152(%r15) 703 + 400520: 98 7f f0 7c lm %r7,%r15,124(%r15) 704 + 400524: 07 f4 br %r4 705 + } 742 706 743 707 744 708 Hartmut ( our compiler developer ) also has been threatening to take out the ··· 753 709 -------------------------------------- 754 710 755 711 If you understand the stuff above you'll understand the stuff 756 - below too so I'll avoid repeating myself & just say that 712 + below too so I'll avoid repeating myself & just say that 757 713 some of the instructions have g's on the end of them to indicate 758 - they are 64 bit & the stack offsets are a bigger, 714 + they are 64 bit & the stack offsets are a bigger, 759 715 the only other difference you'll find between 32 & 64 bit is that 760 - we now use f4 & f6 for floating point arguments on 64 bit. 761 - 00000000800005b0 <test>: 762 - int test(int b) 763 - { 716 + we now use f4 & f6 for floating point arguments on 64 bit:: 717 + 718 + 00000000800005b0 <test>: 719 + int test(int b) 720 + { 764 721 return(5+b); 765 - 800005b0: a7 2a 00 05 ahi %r2,5 766 - 800005b4: b9 14 00 22 lgfr %r2,%r2 # downcast to integer 767 - 800005b8: 07 fe br %r14 768 - 800005ba: 07 07 bcr 0,%r7 722 + 800005b0: a7 2a 00 05 ahi %r2,5 723 + 800005b4: b9 14 00 22 lgfr %r2,%r2 # downcast to integer 724 + 800005b8: 07 fe br %r14 725 + 800005ba: 07 07 bcr 0,%r7 769 726 770 727 771 - } 728 + } 772 729 773 - 00000000800005bc <main>: 774 - main(int argc,char *argv[]) 775 - { 776 - 800005bc: eb bf f0 58 00 24 stmg %r11,%r15,88(%r15) 777 - 800005c2: b9 04 00 1f lgr %r1,%r15 778 - 800005c6: a7 fb ff 60 aghi %r15,-160 779 - 800005ca: e3 10 f0 00 00 24 stg %r1,0(%r15) 730 + 00000000800005bc <main>: 731 + main(int argc,char *argv[]) 732 + { 733 + 800005bc: eb bf f0 58 00 24 stmg %r11,%r15,88(%r15) 734 + 800005c2: b9 04 00 1f lgr %r1,%r15 735 + 800005c6: a7 fb ff 60 aghi %r15,-160 736 + 800005ca: e3 10 f0 00 00 24 stg %r1,0(%r15) 780 737 return(test(5)); 781 - 800005d0: a7 29 00 05 lghi %r2,5 782 - # brasl allows jumps > 64k & is overkill here bras would do fune 783 - 800005d4: c0 e5 ff ff ff ee brasl %r14,800005b0 <test> 784 - 800005da: e3 40 f1 10 00 04 lg %r4,272(%r15) 785 - 800005e0: eb bf f0 f8 00 04 lmg %r11,%r15,248(%r15) 786 - 800005e6: 07 f4 br %r4 787 - } 738 + 800005d0: a7 29 00 05 lghi %r2,5 739 + # brasl allows jumps > 64k & is overkill here bras would do fune 740 + 800005d4: c0 e5 ff ff ff ee brasl %r14,800005b0 <test> 741 + 800005da: e3 40 f1 10 00 04 lg %r4,272(%r15) 742 + 800005e0: eb bf f0 f8 00 04 lmg %r11,%r15,248(%r15) 743 + 800005e6: 07 f4 br %r4 744 + } 788 745 789 746 790 747 ··· 794 749 -gdwarf-2 now works it should be considered the default debugging 795 750 format for s/390 & z/Architecture as it is more reliable for debugging 796 751 shared libraries, normal -g debugging works much better now 797 - Thanks to the IBM java compiler developers bug reports. 752 + Thanks to the IBM java compiler developers bug reports. 798 753 799 - This is typically done adding/appending the flags -g or -gdwarf-2 to the 754 + This is typically done adding/appending the flags -g or -gdwarf-2 to the 800 755 CFLAGS & LDFLAGS variables Makefile of the program concerned. 801 756 802 757 If using gdb & you would like accurate displays of registers & 803 - stack traces compile without optimisation i.e make sure 758 + stack traces compile without optimisation i.e make sure 804 759 that there is no -O2 or similar on the CFLAGS line of the Makefile & 805 - the emitted gcc commands, obviously this will produce worse code 760 + the emitted gcc commands, obviously this will produce worse code 806 761 ( not advisable for shipment ) but it is an aid to the debugging process. 807 762 808 763 This aids debugging because the compiler will copy parameters passed in ··· 811 766 will not compile without optimisation. 812 767 813 768 Debugging with optimisation has since much improved after fixing 814 - some bugs, please make sure you are using gdb-5.0 or later developed 769 + some bugs, please make sure you are using gdb-5.0 or later developed 815 770 after Nov'2000. 816 771 817 772 ··· 824 779 Addresses & values in the VM debugger are always hex never decimal 825 780 Address ranges are of the format <HexValue1>-<HexValue2> or 826 781 <HexValue1>.<HexValue2> 827 - For example, the address range 0x2000 to 0x3000 can be described as 2000-3000 782 + For example, the address range 0x2000 to 0x3000 can be described as 2000-3000 828 783 or 2000.1000 829 784 830 785 The VM Debugger is case insensitive. ··· 843 798 So if you have an objdump listing by hand, it is quite easy to follow, and if 844 799 you don't have an objdump listing keep a copy of the s/390 Reference Summary 845 800 or alternatively the s/390 principles of operation next to you. 846 - e.g. even I can guess that 801 + e.g. even I can guess that 847 802 0001AFF8' LR 180F CC 0 848 - is a ( load register ) lr r0,r15 803 + is a ( load register ) lr r0,r15 849 804 850 805 Also it is very easy to tell the length of a 390 instruction from the 2 most 851 806 significant bits in the instruction (not that this info is really useful except 852 807 if you are trying to make sense of a hexdump of code). 853 808 Here is a table 809 + 810 + ======================= ================== 854 811 Bits Instruction Length 855 - ------------------------------------------ 812 + ======================= ================== 856 813 00 2 Bytes 857 814 01 4 Bytes 858 815 10 4 Bytes 859 816 11 6 Bytes 817 + ======================= ================== 860 818 861 819 The debugger also displays other useful info on the same line such as the 862 820 addresses being operated on destination addresses of branches & condition codes. 863 - e.g. 864 - 00019736' AHI A7DAFF0E CC 1 865 - 000198BA' BRC A7840004 -> 000198C2' CC 0 866 - 000198CE' STM 900EF068 >> 0FA95E78 CC 2 821 + e.g.:: 822 + 823 + 00019736' AHI A7DAFF0E CC 1 824 + 000198BA' BRC A7840004 -> 000198C2' CC 0 825 + 000198CE' STM 900EF068 >> 0FA95E78 CC 2 867 826 868 827 869 828 ··· 875 826 --------------------------- 876 827 877 828 I suppose I'd better mention this before I start 878 - to list the current active traces do 879 - Q TR 829 + to list the current active traces do:: 830 + 831 + Q TR 832 + 880 833 there can be a maximum of 255 of these per set 881 834 ( more about trace sets later ). 882 - To stop traces issue a 883 - TR END. 884 - To delete a particular breakpoint issue 885 - TR DEL <breakpoint number> 835 + 836 + To stop traces issue a:: 837 + 838 + TR END. 839 + 840 + To delete a particular breakpoint issue:: 841 + 842 + TR DEL <breakpoint number> 886 843 887 844 The PA1 key drops to CP mode so you can issue debugger commands, 888 - Doing alt c (on my 3270 console at least ) clears the screen. 845 + Doing alt c (on my 3270 console at least ) clears the screen. 846 + 889 847 hitting b <enter> comes back to the running operating system 890 848 from cp mode ( in our case linux ). 849 + 891 850 It is typically useful to add shortcuts to your profile.exec file 892 851 if you have one ( this is roughly equivalent to autoexec.bat in DOS ). 893 - file here are a few from mine. 894 - /* this gives me command history on issuing f12 */ 895 - set pf12 retrieve 896 - /* this continues */ 897 - set pf8 imm b 898 - /* goes to trace set a */ 899 - set pf1 imm tr goto a 900 - /* goes to trace set b */ 901 - set pf2 imm tr goto b 902 - /* goes to trace set c */ 903 - set pf3 imm tr goto c 852 + file here are a few from mine:: 853 + 854 + /* this gives me command history on issuing f12 */ 855 + set pf12 retrieve 856 + /* this continues */ 857 + set pf8 imm b 858 + /* goes to trace set a */ 859 + set pf1 imm tr goto a 860 + /* goes to trace set b */ 861 + set pf2 imm tr goto b 862 + /* goes to trace set c */ 863 + set pf3 imm tr goto c 904 864 905 865 906 866 907 867 Instruction Tracing 908 868 ------------------- 909 - Setting a simple breakpoint 910 - TR I PSWA <address> 911 - To debug a particular function try 912 - TR I R <function address range> 913 - TR I on its own will single step. 914 - TR I DATA <MNEMONIC> <OPTIONAL RANGE> will trace for particular mnemonics 915 - e.g. 916 - TR I DATA 4D R 0197BC.4000 869 + Setting a simple breakpoint:: 870 + 871 + TR I PSWA <address> 872 + 873 + To debug a particular function try:: 874 + 875 + TR I R <function address range> 876 + TR I on its own will single step. 877 + TR I DATA <MNEMONIC> <OPTIONAL RANGE> will trace for particular mnemonics 878 + 879 + e.g.:: 880 + 881 + TR I DATA 4D R 0197BC.4000 882 + 917 883 will trace for BAS'es ( opcode 4D ) in the range 0197BC.4000 884 + 918 885 if you were inclined you could add traces for all branch instructions & 919 - suffix them with the run prefix so you would have a backtrace on screen 920 - when a program crashes. 921 - TR BR <INTO OR FROM> will trace branches into or out of an address. 922 - e.g. 923 - TR BR INTO 0 is often quite useful if a program is getting awkward & deciding 886 + suffix them with the run prefix so you would have a backtrace on screen 887 + when a program crashes:: 888 + 889 + TR BR <INTO OR FROM> will trace branches into or out of an address. 890 + 891 + e.g.:: 892 + 893 + TR BR INTO 0 894 + 895 + is often quite useful if a program is getting awkward & deciding 924 896 to branch to 0 & crashing as this will stop at the address before in jumps to 0. 925 - TR I R <address range> RUN cmd d g 897 + 898 + :: 899 + 900 + TR I R <address range> RUN cmd d g 901 + 926 902 single steps a range of addresses but stays running & 927 903 displays the gprs on each step. 928 904 ··· 955 881 956 882 Displaying & modifying Registers 957 883 -------------------------------- 958 - D G will display all the gprs 959 - Adding a extra G to all the commands is necessary to access the full 64 bit 884 + D G 885 + will display all the gprs 886 + 887 + Adding a extra G to all the commands is necessary to access the full 64 bit 960 888 content in VM on z/Architecture. Obviously this isn't required for access 961 889 registers as these are still 32 bit. 962 - e.g. DGG instead of DG 963 - D X will display all the control registers 964 - D AR will display all the access registers 965 - D AR4-7 will display access registers 4 to 7 966 - CPU ALL D G will display the GRPS of all CPUS in the configuration 967 - D PSW will display the current PSW 968 - st PSW 2000 will put the value 2000 into the PSW & 969 - cause crash your machine. 970 - D PREFIX displays the prefix offset 890 + 891 + e.g. 892 + 893 + DGG 894 + instead of DG 895 + 896 + D X 897 + will display all the control registers 898 + D AR 899 + will display all the access registers 900 + D AR4-7 901 + will display access registers 4 to 7 902 + CPU ALL D G 903 + will display the GRPS of all CPUS in the configuration 904 + D PSW 905 + will display the current PSW 906 + st PSW 2000 907 + will put the value 2000 into the PSW & cause crash your machine. 908 + D PREFIX 909 + displays the prefix offset 971 910 972 911 973 912 Displaying Memory 974 913 ----------------- 975 - To display memory mapped using the current PSW's mapping try 976 - D <range> 914 + To display memory mapped using the current PSW's mapping try:: 915 + 916 + D <range> 917 + 977 918 To make VM display a message each time it hits a particular address and 978 - continue try 979 - D I<range> will disassemble/display a range of instructions. 980 - ST addr 32 bit word will store a 32 bit aligned address 981 - D T<range> will display the EBCDIC in an address (if you are that way inclined) 982 - D R<range> will display real addresses ( without DAT ) but with prefixing. 919 + continue try: 920 + 921 + D I<range> 922 + will disassemble/display a range of instructions. 923 + 924 + ST addr 32 bit word 925 + will store a 32 bit aligned address 926 + D T<range> 927 + will display the EBCDIC in an address (if you are that way inclined) 928 + D R<range> 929 + will display real addresses ( without DAT ) but with prefixing. 930 + 983 931 There are other complex options to display if you need to get at say home space 984 932 but are in primary space the easiest thing to do is to temporarily 985 933 modify the PSW to the other addressing mode, display the stuff & then 986 934 restore it. 987 935 988 936 989 - 937 + 990 938 Hints 991 939 ----- 992 940 If you want to issue a debugger command without halting your virtual machine 993 - with the PA1 key try prefixing the command with #CP e.g. 994 - #cp tr i pswa 2000 941 + with the PA1 key try prefixing the command with #CP e.g.:: 942 + 943 + #cp tr i pswa 2000 944 + 995 945 also suffixing most debugger commands with RUN will cause them not 996 946 to stop just display the mnemonic at the current instruction on the console. 947 + 997 948 If you have several breakpoints you want to put into your program & 998 949 you get fed up of cross referencing with System.map 999 950 you can do the following trick for several symbols. 1000 - grep do_signal System.map 1001 - which emits the following among other things 1002 - 0001f4e0 T do_signal 1003 - now you can do 1004 951 1005 - TR I PSWA 0001f4e0 cmd msg * do_signal 952 + :: 953 + 954 + grep do_signal System.map 955 + 956 + which emits the following among other things:: 957 + 958 + 0001f4e0 T do_signal 959 + 960 + now you can do:: 961 + 962 + TR I PSWA 0001f4e0 cmd msg * do_signal 963 + 1006 964 This sends a message to your own console each time do_signal is entered. 1007 965 ( As an aside I wrote a perl script once which automatically generated a REXX 1008 966 script with breakpoints on every kernel procedure, this isn't a good idea 1009 967 because there are thousands of these routines & VM can only set 255 breakpoints 1010 - at a time so you nearly had to spend as long pruning the file down as you would 968 + at a time so you nearly had to spend as long pruning the file down as you would 1011 969 entering the msgs by hand), however, the trick might be useful for a single 1012 970 object file. In the 3270 terminal emulator x3270 there is a very useful option 1013 971 in the file menu called "Save Screen In File" - this is very good for keeping a 1014 972 copy of traces. 1015 973 1016 - From CMS help <command name> will give you online help on a particular command. 1017 - e.g. 1018 - HELP DISPLAY 974 + From CMS help <command name> will give you online help on a particular command. 975 + e.g.:: 976 + 977 + HELP DISPLAY 1019 978 1020 979 Also CP has a file called profile.exec which automatically gets called 1021 980 on startup of CMS ( like autoexec.bat ), keeping on a DOS analogy session 1022 981 CP has a feature similar to doskey, it may be useful for you to 1023 - use profile.exec to define some keystrokes. 1024 - e.g. 982 + use profile.exec to define some keystrokes. 983 + 1025 984 SET PF9 IMM B 1026 - This does a single step in VM on pressing F8. 985 + This does a single step in VM on pressing F8. 986 + 1027 987 SET PF10 ^ 1028 - This sets up the ^ key. 1029 - which can be used for ^c (ctrl-c),^z (ctrl-z) which can't be typed directly 1030 - into some 3270 consoles. 988 + This sets up the ^ key. 989 + which can be used for ^c (ctrl-c),^z (ctrl-z) which can't be typed 990 + directly into some 3270 consoles. 991 + 1031 992 SET PF11 ^- 1032 - This types the starting keystrokes for a sysrq see SysRq below. 993 + This types the starting keystrokes for a sysrq see SysRq below. 1033 994 SET PF12 RETRIEVE 1034 - This retrieves command history on pressing F12. 995 + This retrieves command history on pressing F12. 1035 996 1036 997 1037 998 Sometimes in VM the display is set up to scroll automatically this 1038 999 can be very annoying if there are messages you wish to look at 1039 1000 to stop this do 1001 + 1040 1002 TERM MORE 255 255 1041 - This will nearly stop automatic screen updates, however it will 1042 - cause a denial of service if lots of messages go to the 3270 console, 1043 - so it would be foolish to use this as the default on a production machine. 1044 - 1003 + This will nearly stop automatic screen updates, however it will 1004 + cause a denial of service if lots of messages go to the 3270 console, 1005 + so it would be foolish to use this as the default on a production machine. 1006 + 1045 1007 1046 1008 Tracing particular processes 1047 1009 ---------------------------- ··· 1086 976 this simplifies debugging the kernel. 1087 977 However it is quite common for user processes to have addresses which collide 1088 978 this can make debugging a particular process under VM painful under normal 1089 - circumstances as the process may change when doing a 1090 - TR I R <address range>. 979 + circumstances as the process may change when doing a:: 980 + 981 + TR I R <address range>. 982 + 1091 983 Thankfully after reading VM's online help I figured out how to debug 1092 984 I particular process. 1093 985 1094 986 Your first problem is to find the STD ( segment table designation ) 1095 987 of the program you wish to debug. 1096 988 There are several ways you can do this here are a few 1097 - 1) objdump --syms <program to be debugged> | grep main 1098 - To get the address of main in the program. 1099 - tr i pswa <address of main> 989 + 990 + Run:: 991 + 992 + objdump --syms <program to be debugged> | grep main 993 + 994 + To get the address of main in the program. Then:: 995 + 996 + tr i pswa <address of main> 997 + 1100 998 Start the program, if VM drops to CP on what looks like the entry 1101 999 point of the main function this is most likely the process you wish to debug. 1102 1000 Now do a D X13 or D XG13 on z/Architecture. 1103 - On 31 bit the STD is bits 1-19 ( the STO segment table origin ) 1001 + 1002 + On 31 bit the STD is bits 1-19 ( the STO segment table origin ) 1104 1003 & 25-31 ( the STL segment table length ) of CR13. 1105 - now type 1106 - TR I R STD <CR13's value> 0.7fffffff 1107 - e.g. 1108 - TR I R STD 8F32E1FF 0.7fffffff 1109 - Another very useful variation is 1110 - TR STORE INTO STD <CR13's value> <address range> 1004 + 1005 + now type:: 1006 + 1007 + TR I R STD <CR13's value> 0.7fffffff 1008 + 1009 + e.g.:: 1010 + 1011 + TR I R STD 8F32E1FF 0.7fffffff 1012 + 1013 + Another very useful variation is:: 1014 + 1015 + TR STORE INTO STD <CR13's value> <address range> 1016 + 1111 1017 for finding out when a particular variable changes. 1112 1018 1113 - An alternative way of finding the STD of a currently running process 1019 + An alternative way of finding the STD of a currently running process 1114 1020 is to do the following, ( this method is more complex but 1115 1021 could be quite convenient if you aren't updating the kernel much & 1116 1022 so your kernel structures will stay constant for a reasonable period of 1117 1023 time ). 1118 1024 1119 - grep task /proc/<pid>/status 1120 - from this you should see something like 1121 - task: 0f160000 ksp: 0f161de8 pt_regs: 0f161f68 1025 + :: 1026 + 1027 + grep task /proc/<pid>/status 1028 + 1029 + from this you should see something like:: 1030 + 1031 + task: 0f160000 ksp: 0f161de8 pt_regs: 0f161f68 1032 + 1122 1033 This now gives you a pointer to the task structure. 1123 - Now make CC:="s390-gcc -g" kernel/sched.s 1034 + 1035 + Now make:: 1036 + 1037 + CC:="s390-gcc -g" kernel/sched.s 1038 + 1124 1039 To get the task_struct stabinfo. 1040 + 1125 1041 ( task_struct is defined in include/linux/sched.h ). 1042 + 1126 1043 Now we want to look at 1127 1044 task->active_mm->pgd 1045 + 1128 1046 on my machine the active_mm in the task structure stab is 1129 1047 active_mm:(4,12),672,32 1048 + 1130 1049 its offset is 672/8=84=0x54 1050 + 1131 1051 the pgd member in the mm_struct stab is 1132 1052 pgd:(4,6)=*(29,5),96,32 1133 1053 so its offset is 96/8=12=0xc 1134 1054 1135 - so we'll 1136 - hexdump -s 0xf160054 /dev/mem | more 1055 + so we'll:: 1056 + 1057 + hexdump -s 0xf160054 /dev/mem | more 1058 + 1137 1059 i.e. task_struct+active_mm offset 1138 - to look at the active_mm member 1139 - f160054 0fee cc60 0019 e334 0000 0000 0000 0011 1140 - hexdump -s 0x0feecc6c /dev/mem | more 1141 - i.e. active_mm+pgd offset 1142 - feecc6c 0f2c 0000 0000 0001 0000 0001 0000 0010 1060 + to look at the active_mm member:: 1061 + 1062 + f160054 0fee cc60 0019 e334 0000 0000 0000 0011 1063 + 1064 + :: 1065 + 1066 + hexdump -s 0x0feecc6c /dev/mem | more 1067 + 1068 + i.e. active_mm+pgd offset:: 1069 + 1070 + feecc6c 0f2c 0000 0000 0001 0000 0001 0000 0010 1071 + 1143 1072 we get something like 1144 - now do 1145 - TR I R STD <pgd|0x7f> 0.7fffffff 1073 + now do:: 1074 + 1075 + TR I R STD <pgd|0x7f> 0.7fffffff 1076 + 1146 1077 i.e. the 0x7f is added because the pgd only 1147 1078 gives the page table origin & we need to set the low bits 1148 1079 to the maximum possible segment table length. 1149 - TR I R STD 0f2c007f 0.7fffffff 1150 - on z/Architecture you'll probably need to do 1151 - TR I R STD <pgd|0x7> 0.ffffffffffffffff 1080 + 1081 + :: 1082 + 1083 + TR I R STD 0f2c007f 0.7fffffff 1084 + 1085 + on z/Architecture you'll probably need to do:: 1086 + 1087 + TR I R STD <pgd|0x7> 0.ffffffffffffffff 1088 + 1152 1089 to set the TableType to 0x1 & the Table length to 3. 1153 1090 1154 1091 ··· 1208 1051 option. 1209 1052 1210 1053 1211 - The most common ones you will normally be tracing for is 1212 - 1=operation exception 1213 - 2=privileged operation exception 1214 - 4=protection exception 1215 - 5=addressing exception 1216 - 6=specification exception 1217 - 10=segment translation exception 1218 - 11=page translation exception 1054 + The most common ones you will normally be tracing for is: 1055 + 1056 + - 1=operation exception 1057 + - 2=privileged operation exception 1058 + - 4=protection exception 1059 + - 5=addressing exception 1060 + - 6=specification exception 1061 + - 10=segment translation exception 1062 + - 11=page translation exception 1219 1063 1220 1064 The full list of these is on page 22 of the current s/390 Reference Summary. 1221 1065 e.g. 1066 + 1222 1067 tr prog 10 will trace segment translation exceptions. 1068 + 1223 1069 tr prog on its own will trace all program interruption codes. 1224 1070 1225 1071 Trace Sets 1226 1072 ---------- 1227 1073 On starting VM you are initially in the INITIAL trace set. 1228 1074 You can do a Q TR to verify this. 1229 - If you have a complex tracing situation where you wish to wait for instance 1075 + If you have a complex tracing situation where you wish to wait for instance 1230 1076 till a driver is open before you start tracing IO, but know in your 1231 1077 heart that you are going to have to make several runs through the code till you 1232 - have a clue whats going on. 1078 + have a clue whats going on. 1233 1079 1234 - What you can do is 1235 - TR I PSWA <Driver open address> 1080 + What you can do is:: 1081 + 1082 + TR I PSWA <Driver open address> 1083 + 1236 1084 hit b to continue till breakpoint 1085 + 1237 1086 reach the breakpoint 1238 - now do your 1239 - TR GOTO B 1240 - TR IO 7c08-7c09 inst int run 1087 + 1088 + now do your:: 1089 + 1090 + TR GOTO B 1091 + TR IO 7c08-7c09 inst int run 1092 + 1241 1093 or whatever the IO channels you wish to trace are & hit b 1242 1094 1243 - To got back to the initial trace set do 1244 - TR GOTO INITIAL 1095 + To got back to the initial trace set do:: 1096 + 1097 + TR GOTO INITIAL 1098 + 1245 1099 & the TR I PSWA <Driver open address> will be the only active breakpoint again. 1246 1100 1247 1101 ··· 1261 1093 Syscalls are implemented on Linux for S390 by the Supervisor call instruction 1262 1094 (SVC). There 256 possibilities of these as the instruction is made up of a 0xA 1263 1095 opcode and the second byte being the syscall number. They are traced using the 1264 - simple command: 1265 - TR SVC <Optional value or range> 1096 + simple command:: 1097 + 1098 + TR SVC <Optional value or range> 1099 + 1266 1100 the syscalls are defined in linux/arch/s390/include/asm/unistd.h 1267 - e.g. to trace all file opens just do 1268 - TR SVC 5 ( as this is the syscall number of open ) 1101 + e.g. to trace all file opens just do:: 1102 + 1103 + TR SVC 5 ( as this is the syscall number of open ) 1269 1104 1270 1105 1271 1106 SMP Specific commands ··· 1276 1105 To find out how many cpus you have 1277 1106 Q CPUS displays all the CPU's available to your virtual machine 1278 1107 To find the cpu that the current cpu VM debugger commands are being directed at 1279 - do Q CPU to change the current cpu VM debugger commands are being directed at do 1280 - CPU <desired cpu no> 1108 + do Q CPU to change the current cpu VM debugger commands are being directed at 1109 + do:: 1110 + 1111 + CPU <desired cpu no> 1281 1112 1282 1113 On a SMP guest issue a command to all CPUs try prefixing the command with cpu 1283 - all. To issue a command to a particular cpu try cpu <cpu number> e.g. 1284 - CPU 01 TR I R 2000.3000 1114 + all. To issue a command to a particular cpu try cpu <cpu number> e.g.:: 1115 + 1116 + CPU 01 TR I R 2000.3000 1117 + 1285 1118 If you are running on a guest with several cpus & you have a IO related problem 1286 1119 & cannot follow the flow of code but you know it isn't smp related. 1287 - from the bash prompt issue 1288 - shutdown -h now or halt. 1289 - do a Q CPUS to find out how many cpus you have 1290 - detach each one of them from cp except cpu 0 1291 - by issuing a 1292 - DETACH CPU 01-(number of cpus in configuration) 1120 + 1121 + from the bash prompt issue:: 1122 + 1123 + shutdown -h now or halt. 1124 + 1125 + do a:: 1126 + 1127 + Q CPUS 1128 + 1129 + to find out how many cpus you have detach each one of them from cp except 1130 + cpu 0 by issuing a:: 1131 + 1132 + DETACH CPU 01-(number of cpus in configuration) 1133 + 1293 1134 & boot linux again. 1294 - TR SIGP will trace inter processor signal processor instructions. 1295 - DEFINE CPU 01-(number in configuration) 1296 - will get your guests cpus back. 1135 + 1136 + TR SIGP 1137 + will trace inter processor signal processor instructions. 1138 + 1139 + DEFINE CPU 01-(number in configuration) 1140 + will get your guests cpus back. 1297 1141 1298 1142 1299 1143 Help for displaying ascii textstrings 1300 1144 ------------------------------------- 1301 1145 On the very latest VM Nucleus'es VM can now display ascii 1302 - ( thanks Neale for the hint ) by doing 1303 - D TX<lowaddr>.<len> 1304 - e.g. 1305 - D TX0.100 1146 + ( thanks Neale for the hint ) by doing:: 1147 + 1148 + D TX<lowaddr>.<len> 1149 + 1150 + e.g.:: 1151 + 1152 + D TX0.100 1306 1153 1307 1154 Alternatively 1308 1155 ============= ··· 1332 1143 This is quite useful when looking at a parameter passed in as a text string 1333 1144 under VM ( unless you are good at decoding ASCII in your head ). 1334 1145 1335 - e.g. consider tracing an open syscall 1336 - TR SVC 5 1337 - We have stopped at a breakpoint 1338 - 000151B0' SVC 0A05 -> 0001909A' CC 0 1146 + e.g. consider tracing an open syscall:: 1147 + 1148 + TR SVC 5 1149 + 1150 + We have stopped at a breakpoint:: 1151 + 1152 + 000151B0' SVC 0A05 -> 0001909A' CC 0 1339 1153 1340 1154 D 20.8 to check the SVC old psw in the prefix area and see was it from userspace 1341 1155 (for the layout of the prefix area consult the "Fixed Storage Locations" 1342 1156 chapter of the s/390 Reference Summary if you have it available). 1343 - V00000020 070C2000 800151B2 1157 + 1158 + :: 1159 + 1160 + V00000020 070C2000 800151B2 1161 + 1344 1162 The problem state bit wasn't set & it's also too early in the boot sequence 1345 - for it to be a userspace SVC if it was we would have to temporarily switch the 1163 + for it to be a userspace SVC if it was we would have to temporarily switch the 1346 1164 psw to user space addressing so we could get at the first parameter of the open 1347 1165 in gpr2. 1348 - Next do a 1349 - D G2 1350 - GPR 2 = 00014CB4 1351 - Now display what gpr2 is pointing to 1352 - D 00014CB4.20 1353 - V00014CB4 2F646576 2F636F6E 736F6C65 00001BF5 1354 - V00014CC4 FC00014C B4001001 E0001000 B8070707 1166 + 1167 + Next do a:: 1168 + 1169 + D G2 1170 + GPR 2 = 00014CB4 1171 + 1172 + Now display what gpr2 is pointing to:: 1173 + 1174 + D 00014CB4.20 1175 + V00014CB4 2F646576 2F636F6E 736F6C65 00001BF5 1176 + V00014CC4 FC00014C B4001001 E0001000 B8070707 1177 + 1355 1178 Now copy the text till the first 00 hex ( which is the end of the string 1356 - to an xterm & do hex2ascii on it. 1357 - hex2ascii 2F646576 2F636F6E 736F6C65 00 1358 - outputs 1359 - Decoded Hex:=/ d e v / c o n s o l e 0x00 1179 + to an xterm & do hex2ascii on it:: 1180 + 1181 + hex2ascii 2F646576 2F636F6E 736F6C65 00 1182 + 1183 + outputs:: 1184 + 1185 + Decoded Hex:=/ d e v / c o n s o l e 0x00 1186 + 1360 1187 We were opening the console device, 1361 1188 1362 1189 You can compile the code below yourself for practice :-), 1363 - /* 1364 - * hex2ascii.c 1365 - * a useful little tool for converting a hexadecimal command line to ascii 1366 - * 1367 - * Author(s): Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com) 1368 - * (C) 2000 IBM Deutschland Entwicklung GmbH, IBM Corporation. 1369 - */ 1370 - #include <stdio.h> 1371 1190 1372 - int main(int argc,char *argv[]) 1373 - { 1374 - int cnt1,cnt2,len,toggle=0; 1375 - int startcnt=1; 1376 - unsigned char c,hex; 1377 - 1378 - if(argc>1&&(strcmp(argv[1],"-a")==0)) 1379 - startcnt=2; 1380 - printf("Decoded Hex:="); 1381 - for(cnt1=startcnt;cnt1<argc;cnt1++) 1191 + :: 1192 + 1193 + /* 1194 + * hex2ascii.c 1195 + * a useful little tool for converting a hexadecimal command line to ascii 1196 + * 1197 + * Author(s): Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com) 1198 + * (C) 2000 IBM Deutschland Entwicklung GmbH, IBM Corporation. 1199 + */ 1200 + #include <stdio.h> 1201 + 1202 + int main(int argc,char *argv[]) 1382 1203 { 1383 - len=strlen(argv[cnt1]); 1384 - for(cnt2=0;cnt2<len;cnt2++) 1204 + int cnt1,cnt2,len,toggle=0; 1205 + int startcnt=1; 1206 + unsigned char c,hex; 1207 + 1208 + if(argc>1&&(strcmp(argv[1],"-a")==0)) 1209 + startcnt=2; 1210 + printf("Decoded Hex:="); 1211 + for(cnt1=startcnt;cnt1<argc;cnt1++) 1385 1212 { 1386 - c=argv[cnt1][cnt2]; 1387 - if(c>='0'&&c<='9') 1213 + len=strlen(argv[cnt1]); 1214 + for(cnt2=0;cnt2<len;cnt2++) 1215 + { 1216 + c=argv[cnt1][cnt2]; 1217 + if(c>='0'&&c<='9') 1388 1218 c=c-'0'; 1389 - if(c>='A'&&c<='F') 1219 + if(c>='A'&&c<='F') 1390 1220 c=c-'A'+10; 1391 - if(c>='a'&&c<='f') 1221 + if(c>='a'&&c<='f') 1392 1222 c=c-'a'+10; 1393 - switch(toggle) 1394 - { 1223 + switch(toggle) 1224 + { 1395 1225 case 0: 1396 1226 hex=c<<4; 1397 1227 toggle=1; ··· 1432 1224 } 1433 1225 toggle=0; 1434 1226 break; 1435 - } 1227 + } 1228 + } 1436 1229 } 1230 + printf("\n"); 1437 1231 } 1438 - printf("\n"); 1439 - } 1440 1232 1441 1233 1442 1234 ··· 1456 1248 1) A kernel address should be easy to recognise since it is in 1457 1249 primary space & the problem state bit isn't set & also 1458 1250 The Hi bit of the address is set. 1459 - 2) Another backchain should also be easy to recognise since it is an 1251 + 2) Another backchain should also be easy to recognise since it is an 1460 1252 address pointing to another address approximately 100 bytes or 0x70 hex 1461 1253 behind the current stackpointer. 1462 1254 1463 1255 1464 1256 Here is some practice. 1257 + 1465 1258 boot the kernel & hit PA1 at some random time 1466 - d g to display the gprs, this should display something like 1467 - GPR 0 = 00000001 00156018 0014359C 00000000 1468 - GPR 4 = 00000001 001B8888 000003E0 00000000 1469 - GPR 8 = 00100080 00100084 00000000 000FE000 1470 - GPR 12 = 00010400 8001B2DC 8001B36A 000FFED8 1259 + 1260 + d g to display the gprs, this should display something like:: 1261 + 1262 + GPR 0 = 00000001 00156018 0014359C 00000000 1263 + GPR 4 = 00000001 001B8888 000003E0 00000000 1264 + GPR 8 = 00100080 00100084 00000000 000FE000 1265 + GPR 12 = 00010400 8001B2DC 8001B36A 000FFED8 1266 + 1471 1267 Note that GPR14 is a return address but as we are real men we are going to 1472 1268 trace the stack. 1473 - display 0x40 bytes after the stack pointer. 1269 + display 0x40 bytes after the stack pointer:: 1474 1270 1475 - V000FFED8 000FFF38 8001B838 80014C8E 000FFF38 1476 - V000FFEE8 00000000 00000000 000003E0 00000000 1477 - V000FFEF8 00100080 00100084 00000000 000FE000 1478 - V000FFF08 00010400 8001B2DC 8001B36A 000FFED8 1271 + V000FFED8 000FFF38 8001B838 80014C8E 000FFF38 1272 + V000FFEE8 00000000 00000000 000003E0 00000000 1273 + V000FFEF8 00100080 00100084 00000000 000FE000 1274 + V000FFF08 00010400 8001B2DC 8001B36A 000FFED8 1479 1275 1480 1276 1481 1277 Ah now look at whats in sp+56 (sp+0x38) this is 8001B36A our saved r14 if 1482 1278 you look above at our stackframe & also agrees with GPR14. 1483 1279 1484 - now backchain 1485 - d 000FFF38.40 1486 - we now are taking the contents of SP to get our first backchain. 1280 + now backchain:: 1487 1281 1488 - V000FFF38 000FFFA0 00000000 00014995 00147094 1489 - V000FFF48 00147090 001470A0 000003E0 00000000 1490 - V000FFF58 00100080 00100084 00000000 001BF1D0 1491 - V000FFF68 00010400 800149BA 80014CA6 000FFF38 1282 + d 000FFF38.40 1283 + 1284 + we now are taking the contents of SP to get our first backchain:: 1285 + 1286 + V000FFF38 000FFFA0 00000000 00014995 00147094 1287 + V000FFF48 00147090 001470A0 000003E0 00000000 1288 + V000FFF58 00100080 00100084 00000000 001BF1D0 1289 + V000FFF68 00010400 800149BA 80014CA6 000FFF38 1492 1290 1493 1291 This displays a 2nd return address of 80014CA6 1494 1292 1495 - now do d 000FFFA0.40 for our 3rd backchain 1293 + now do:: 1496 1294 1497 - V000FFFA0 04B52002 0001107F 00000000 00000000 1498 - V000FFFB0 00000000 00000000 FF000000 0001107F 1499 - V000FFFC0 00000000 00000000 00000000 00000000 1500 - V000FFFD0 00010400 80010802 8001085A 000FFFA0 1295 + d 000FFFA0.40 1296 + 1297 + for our 3rd backchain:: 1298 + 1299 + V000FFFA0 04B52002 0001107F 00000000 00000000 1300 + V000FFFB0 00000000 00000000 FF000000 0001107F 1301 + V000FFFC0 00000000 00000000 00000000 00000000 1302 + V000FFFD0 00010400 80010802 8001085A 000FFFA0 1501 1303 1502 1304 1503 1305 our 3rd return address is 8001085A ··· 1515 1297 as the 04B52002 looks suspiciously like rubbish it is fair to assume that the 1516 1298 kernel entry routines for the sake of optimisation don't set up a backchain. 1517 1299 1518 - now look at System.map to see if the addresses make any sense. 1300 + now look at System.map to see if the addresses make any sense:: 1519 1301 1520 - grep -i 0001b3 System.map 1521 - outputs among other things 1522 - 0001b304 T cpu_idle 1302 + grep -i 0001b3 System.map 1303 + 1304 + outputs among other things:: 1305 + 1306 + 0001b304 T cpu_idle 1307 + 1523 1308 so 8001B36A 1524 1309 is cpu_idle+0x66 ( quiet the cpu is asleep, don't wake it ) 1525 1310 1311 + :: 1526 1312 1527 - grep -i 00014 System.map 1528 - produces among other things 1529 - 00014a78 T start_kernel 1313 + grep -i 00014 System.map 1314 + 1315 + produces among other things:: 1316 + 1317 + 00014a78 T start_kernel 1318 + 1530 1319 so 0014CA6 is start_kernel+some hex number I can't add in my head. 1531 1320 1532 - grep -i 00108 System.map 1533 - this produces 1534 - 00010800 T _stext 1321 + :: 1322 + 1323 + grep -i 00108 System.map 1324 + 1325 + this produces:: 1326 + 1327 + 00010800 T _stext 1328 + 1535 1329 so 8001085A is _stext+0x5a 1536 1330 1537 1331 Congrats you've done your first backchain. ··· 1567 1337 Here is some of the common IO terminology: 1568 1338 1569 1339 Subchannel: 1570 - This is the logical number most IO commands use to talk to an IO device. There 1571 - can be up to 0x10000 (65536) of these in a configuration, typically there are a 1572 - few hundred. Under VM for simplicity they are allocated contiguously, however 1573 - on the native hardware they are not. They typically stay consistent between 1574 - boots provided no new hardware is inserted or removed. 1575 - Under Linux for s390 we use these as IRQ's and also when issuing an IO command 1576 - (CLEAR SUBCHANNEL, HALT SUBCHANNEL, MODIFY SUBCHANNEL, RESUME SUBCHANNEL, 1577 - START SUBCHANNEL, STORE SUBCHANNEL and TEST SUBCHANNEL). We use this as the ID 1578 - of the device we wish to talk to. The most important of these instructions are 1579 - START SUBCHANNEL (to start IO), TEST SUBCHANNEL (to check whether the IO 1580 - completed successfully) and HALT SUBCHANNEL (to kill IO). A subchannel can have 1581 - up to 8 channel paths to a device, this offers redundancy if one is not 1582 - available. 1340 + This is the logical number most IO commands use to talk to an IO device. There 1341 + can be up to 0x10000 (65536) of these in a configuration, typically there are a 1342 + few hundred. Under VM for simplicity they are allocated contiguously, however 1343 + on the native hardware they are not. They typically stay consistent between 1344 + boots provided no new hardware is inserted or removed. 1345 + 1346 + Under Linux for s390 we use these as IRQ's and also when issuing an IO command 1347 + (CLEAR SUBCHANNEL, HALT SUBCHANNEL, MODIFY SUBCHANNEL, RESUME SUBCHANNEL, 1348 + START SUBCHANNEL, STORE SUBCHANNEL and TEST SUBCHANNEL). We use this as the ID 1349 + of the device we wish to talk to. The most important of these instructions are 1350 + START SUBCHANNEL (to start IO), TEST SUBCHANNEL (to check whether the IO 1351 + completed successfully) and HALT SUBCHANNEL (to kill IO). A subchannel can have 1352 + up to 8 channel paths to a device, this offers redundancy if one is not 1353 + available. 1583 1354 1584 1355 Device Number: 1585 - This number remains static and is closely tied to the hardware. There are 65536 1586 - of these, made up of a CHPID (Channel Path ID, the most significant 8 bits) and 1587 - another lsb 8 bits. These remain static even if more devices are inserted or 1588 - removed from the hardware. There is a 1 to 1 mapping between subchannels and 1589 - device numbers, provided devices aren't inserted or removed. 1356 + This number remains static and is closely tied to the hardware. There are 65536 1357 + of these, made up of a CHPID (Channel Path ID, the most significant 8 bits) and 1358 + another lsb 8 bits. These remain static even if more devices are inserted or 1359 + removed from the hardware. There is a 1 to 1 mapping between subchannels and 1360 + device numbers, provided devices aren't inserted or removed. 1590 1361 1591 1362 Channel Control Words: 1592 - CCWs are linked lists of instructions initially pointed to by an operation 1593 - request block (ORB), which is initially given to Start Subchannel (SSCH) 1594 - command along with the subchannel number for the IO subsystem to process 1595 - while the CPU continues executing normal code. 1596 - CCWs come in two flavours, Format 0 (24 bit for backward compatibility) and 1597 - Format 1 (31 bit). These are typically used to issue read and write (and many 1598 - other) instructions. They consist of a length field and an absolute address 1599 - field. 1600 - Each IO typically gets 1 or 2 interrupts, one for channel end (primary status) 1601 - when the channel is idle, and the second for device end (secondary status). 1602 - Sometimes you get both concurrently. You check how the IO went on by issuing a 1603 - TEST SUBCHANNEL at each interrupt, from which you receive an Interruption 1604 - response block (IRB). If you get channel and device end status in the IRB 1605 - without channel checks etc. your IO probably went okay. If you didn't you 1606 - probably need to examine the IRB, extended status word etc. 1607 - If an error occurs, more sophisticated control units have a facility known as 1608 - concurrent sense. This means that if an error occurs Extended sense information 1609 - will be presented in the Extended status word in the IRB. If not you have to 1610 - issue a subsequent SENSE CCW command after the test subchannel. 1363 + CCWs are linked lists of instructions initially pointed to by an operation 1364 + request block (ORB), which is initially given to Start Subchannel (SSCH) 1365 + command along with the subchannel number for the IO subsystem to process 1366 + while the CPU continues executing normal code. 1367 + CCWs come in two flavours, Format 0 (24 bit for backward compatibility) and 1368 + Format 1 (31 bit). These are typically used to issue read and write (and many 1369 + other) instructions. They consist of a length field and an absolute address 1370 + field. 1371 + 1372 + Each IO typically gets 1 or 2 interrupts, one for channel end (primary status) 1373 + when the channel is idle, and the second for device end (secondary status). 1374 + Sometimes you get both concurrently. You check how the IO went on by issuing a 1375 + TEST SUBCHANNEL at each interrupt, from which you receive an Interruption 1376 + response block (IRB). If you get channel and device end status in the IRB 1377 + without channel checks etc. your IO probably went okay. If you didn't you 1378 + probably need to examine the IRB, extended status word etc. 1379 + If an error occurs, more sophisticated control units have a facility known as 1380 + concurrent sense. This means that if an error occurs Extended sense information 1381 + will be presented in the Extended status word in the IRB. If not you have to 1382 + issue a subsequent SENSE CCW command after the test subchannel. 1611 1383 1612 1384 1613 1385 TPI (Test pending interrupt) can also be used for polled IO, but in ··· 1620 1388 operating characteristics of a subchannel (e.g. channel paths). 1621 1389 1622 1390 Other IO related Terms: 1623 - Sysplex: S390's Clustering Technology 1624 - QDIO: S390's new high speed IO architecture to support devices such as gigabit 1625 - ethernet, this architecture is also designed to be forward compatible with 1626 - upcoming 64 bit machines. 1391 + 1392 + Sysplex: 1393 + S390's Clustering Technology 1394 + QDIO: 1395 + S390's new high speed IO architecture to support devices such as gigabit 1396 + ethernet, this architecture is also designed to be forward compatible with 1397 + upcoming 64 bit machines. 1627 1398 1628 1399 1629 - General Concepts 1400 + General Concepts 1401 + ---------------- 1630 1402 1631 1403 Input Output Processors (IOP's) are responsible for communicating between 1632 1404 the mainframe CPU's & the channel & relieve the mainframe CPU's from the 1633 - burden of communicating with IO devices directly, this allows the CPU's to 1634 - concentrate on data processing. 1405 + burden of communicating with IO devices directly, this allows the CPU's to 1406 + concentrate on data processing. 1635 1407 1636 - IOP's can use one or more links ( known as channel paths ) to talk to each 1408 + IOP's can use one or more links ( known as channel paths ) to talk to each 1637 1409 IO device. It first checks for path availability & chooses an available one, 1638 1410 then starts ( & sometimes terminates IO ). 1639 1411 There are two types of channel path: ESCON & the Parallel IO interface. 1640 1412 1641 1413 IO devices are attached to control units, control units provide the 1642 - logic to interface the channel paths & channel path IO protocols to 1414 + logic to interface the channel paths & channel path IO protocols to 1643 1415 the IO devices, they can be integrated with the devices or housed separately 1644 - & often talk to several similar devices ( typical examples would be raid 1645 - controllers or a control unit which connects to 1000 3270 terminals ). 1416 + & often talk to several similar devices ( typical examples would be raid 1417 + controllers or a control unit which connects to 1000 3270 terminals ):: 1646 1418 1647 1419 1648 - +---------------------------------------------------------------+ 1649 - | +-----+ +-----+ +-----+ +-----+ +----------+ +----------+ | 1650 - | | CPU | | CPU | | CPU | | CPU | | Main | | Expanded | | 1651 - | | | | | | | | | | Memory | | Storage | | 1652 - | +-----+ +-----+ +-----+ +-----+ +----------+ +----------+ | 1653 - |---------------------------------------------------------------+ 1654 - | IOP | IOP | IOP | 1655 - |--------------------------------------------------------------- 1656 - | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | 1657 - ---------------------------------------------------------------- 1658 - || || 1659 - || Bus & Tag Channel Path || ESCON 1660 - || ====================== || Channel 1661 - || || || || Path 1662 - +----------+ +----------+ +----------+ 1663 - | | | | | | 1664 - | CU | | CU | | CU | 1665 - | | | | | | 1666 - +----------+ +----------+ +----------+ 1667 - | | | | | 1668 - +----------+ +----------+ +----------+ +----------+ +----------+ 1669 - |I/O Device| |I/O Device| |I/O Device| |I/O Device| |I/O Device| 1670 - +----------+ +----------+ +----------+ +----------+ +----------+ 1671 - CPU = Central Processing Unit 1672 - C = Channel 1673 - IOP = IP Processor 1674 - CU = Control Unit 1420 + +---------------------------------------------------------------+ 1421 + | +-----+ +-----+ +-----+ +-----+ +----------+ +----------+ | 1422 + | | CPU | | CPU | | CPU | | CPU | | Main | | Expanded | | 1423 + | | | | | | | | | | Memory | | Storage | | 1424 + | +-----+ +-----+ +-----+ +-----+ +----------+ +----------+ | 1425 + |---------------------------------------------------------------+ 1426 + | IOP | IOP | IOP | 1427 + |--------------------------------------------------------------- 1428 + | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | 1429 + ---------------------------------------------------------------- 1430 + || || 1431 + || Bus & Tag Channel Path || ESCON 1432 + || ====================== || Channel 1433 + || || || || Path 1434 + +----------+ +----------+ +----------+ 1435 + | | | | | | 1436 + | CU | | CU | | CU | 1437 + | | | | | | 1438 + +----------+ +----------+ +----------+ 1439 + | | | | | 1440 + +----------+ +----------+ +----------+ +----------+ +----------+ 1441 + |I/O Device| |I/O Device| |I/O Device| |I/O Device| |I/O Device| 1442 + +----------+ +----------+ +----------+ +----------+ +----------+ 1443 + CPU = Central Processing Unit 1444 + C = Channel 1445 + IOP = IP Processor 1446 + CU = Control Unit 1675 1447 1676 1448 The 390 IO systems come in 2 flavours the current 390 machines support both 1677 1449 ··· 1683 1447 sometimes called Bus-and Tag & sometimes Original Equipment Manufacturers 1684 1448 Interface (OEMI). 1685 1449 1686 - This byte wide Parallel channel path/bus has parity & data on the "Bus" cable 1450 + This byte wide Parallel channel path/bus has parity & data on the "Bus" cable 1687 1451 and control lines on the "Tag" cable. These can operate in byte multiplex mode 1688 1452 for sharing between several slow devices or burst mode and monopolize the 1689 1453 channel for the whole burst. Up to 256 devices can be addressed on one of these ··· 1695 1459 One of these paths can be daisy chained to up to 8 control units. 1696 1460 1697 1461 1698 - ESCON if fibre optic it is also called FICON 1462 + ESCON if fibre optic it is also called FICON 1699 1463 Was introduced by IBM in 1990. Has 2 fibre optic cables and uses either leds or 1700 1464 lasers for communication at a signaling rate of up to 200 megabits/sec. As 1701 1465 10bits are transferred for every 8 bits info this drops to 160 megabits/sec 1702 1466 and to 18.6 Megabytes/sec once control info and CRC are added. ESCON only 1703 1467 operates in burst mode. 1704 - 1468 + 1705 1469 ESCONs typical max cable length is 3km for the led version and 20km for the 1706 1470 laser version known as XDF (extended distance facility). This can be further 1707 1471 extended by using an ESCON director which triples the above mentioned ranges. ··· 1725 1489 1726 1490 Now we are ready to go on with IO tracing commands under VM 1727 1491 1728 - A few self explanatory queries: 1729 - Q OSA 1730 - Q CTC 1731 - Q DISK ( This command is CMS specific ) 1732 - Q DASD 1492 + A few self explanatory queries:: 1733 1493 1494 + Q OSA 1495 + Q CTC 1496 + Q DISK ( This command is CMS specific ) 1497 + Q DASD 1734 1498 1499 + Q OSA on my machine returns:: 1735 1500 1736 - 1737 - 1738 - 1739 - Q OSA on my machine returns 1740 - OSA 7C08 ON OSA 7C08 SUBCHANNEL = 0000 1741 - OSA 7C09 ON OSA 7C09 SUBCHANNEL = 0001 1742 - OSA 7C14 ON OSA 7C14 SUBCHANNEL = 0002 1743 - OSA 7C15 ON OSA 7C15 SUBCHANNEL = 0003 1501 + OSA 7C08 ON OSA 7C08 SUBCHANNEL = 0000 1502 + OSA 7C09 ON OSA 7C09 SUBCHANNEL = 0001 1503 + OSA 7C14 ON OSA 7C14 SUBCHANNEL = 0002 1504 + OSA 7C15 ON OSA 7C15 SUBCHANNEL = 0003 1744 1505 1745 1506 If you have a guest with certain privileges you may be able to see devices 1746 1507 which don't belong to you. To avoid this, add the option V. 1747 - e.g. 1748 - Q V OSA 1508 + e.g.:: 1509 + 1510 + Q V OSA 1749 1511 1750 1512 Now using the device numbers returned by this command we will 1751 1513 Trace the io starting up on the first device 7c08 & 7c09 1752 - In our simplest case we can trace the 1514 + In our simplest case we can trace the 1753 1515 start subchannels 1754 1516 like TR SSCH 7C08-7C09 1755 1517 or the halt subchannels ··· 1758 1524 of another VM guest so he can ftp the logfile back to his own machine. I'll do 1759 1525 a small bit of this and give you a look at the output. 1760 1526 1761 - 1) Spool stdout to VM reader 1762 - SP PRT TO (another vm guest ) or * for the local vm guest 1763 - 2) Fill the reader with the trace 1764 - TR IO 7c08-7c09 INST INT CCW PRT RUN 1765 - 3) Start up linux 1766 - i 00c 1767 - 4) Finish the trace 1768 - TR END 1769 - 5) close the reader 1770 - C PRT 1771 - 6) list reader contents 1772 - RDRLIST 1773 - 7) copy it to linux4's minidisk 1774 - RECEIVE / LOG TXT A1 ( replace 1527 + 1) Spool stdout to VM reader:: 1528 + 1529 + SP PRT TO (another vm guest ) or * for the local vm guest 1530 + 1531 + 2) Fill the reader with the trace:: 1532 + 1533 + TR IO 7c08-7c09 INST INT CCW PRT RUN 1534 + 1535 + 3) Start up linux:: 1536 + 1537 + i 00c 1538 + 4) Finish the trace:: 1539 + 1540 + TR END 1541 + 1542 + 5) close the reader:: 1543 + 1544 + C PRT 1545 + 1546 + 6) list reader contents:: 1547 + 1548 + RDRLIST 1549 + 1550 + 7) copy it to linux4's minidisk:: 1551 + 1552 + RECEIVE / LOG TXT A1 ( replace 1553 + 1775 1554 8) 1776 1555 filel & press F11 to look at it 1777 - You should see something like: 1556 + You should see something like:: 1778 1557 1779 - 00020942' SSCH B2334000 0048813C CC 0 SCH 0000 DEV 7C08 1780 - CPA 000FFDF0 PARM 00E2C9C4 KEY 0 FPI C0 LPM 80 1781 - CCW 000FFDF0 E4200100 00487FE8 0000 E4240100 ........ 1782 - IDAL 43D8AFE8 1783 - IDAL 0FB76000 1784 - 00020B0A' I/O DEV 7C08 -> 000197BC' SCH 0000 PARM 00E2C9C4 1785 - 00021628' TSCH B2354000 >> 00488164 CC 0 SCH 0000 DEV 7C08 1786 - CCWA 000FFDF8 DEV STS 0C SCH STS 00 CNT 00EC 1787 - KEY 0 FPI C0 CC 0 CTLS 4007 1788 - 00022238' STSCH B2344000 >> 00488108 CC 0 SCH 0000 DEV 7C08 1558 + 00020942' SSCH B2334000 0048813C CC 0 SCH 0000 DEV 7C08 1559 + CPA 000FFDF0 PARM 00E2C9C4 KEY 0 FPI C0 LPM 80 1560 + CCW 000FFDF0 E4200100 00487FE8 0000 E4240100 ........ 1561 + IDAL 43D8AFE8 1562 + IDAL 0FB76000 1563 + 00020B0A' I/O DEV 7C08 -> 000197BC' SCH 0000 PARM 00E2C9C4 1564 + 00021628' TSCH B2354000 >> 00488164 CC 0 SCH 0000 DEV 7C08 1565 + CCWA 000FFDF8 DEV STS 0C SCH STS 00 CNT 00EC 1566 + KEY 0 FPI C0 CC 0 CTLS 4007 1567 + 00022238' STSCH B2344000 >> 00488108 CC 0 SCH 0000 DEV 7C08 1789 1568 1790 1569 If you don't like messing up your readed ( because you possibly booted from it ) 1791 1570 you can alternatively spool it to another readers guest. ··· 1810 1563 been of use to me in the past & may be of use to 1811 1564 you too. For more complete info on each of the commands 1812 1565 use type HELP <command> from CMS. 1813 - detaching devices 1814 - DET <devno range> 1815 - ATT <devno range> <guest> 1816 - attach a device to guest * for your own guest 1817 - READY <devno> cause VM to issue a fake interrupt. 1818 1566 1819 - The VARY command is normally only available to VM administrators. 1820 - VARY ON PATH <path> TO <devno range> 1821 - VARY OFF PATH <PATH> FROM <devno range> 1567 + detaching devices:: 1568 + 1569 + DET <devno range> 1570 + ATT <devno range> <guest> 1571 + 1572 + attach a device to guest * for your own guest 1573 + 1574 + READY <devno> 1575 + cause VM to issue a fake interrupt. 1576 + 1577 + The VARY command is normally only available to VM administrators:: 1578 + 1579 + VARY ON PATH <path> TO <devno range> 1580 + VARY OFF PATH <PATH> FROM <devno range> 1581 + 1822 1582 This is used to switch on or off channel paths to devices. 1823 1583 1824 1584 Q CHPID <channel path ID> 1825 - This displays state of devices using this channel path 1585 + This displays state of devices using this channel path 1586 + 1826 1587 D SCHIB <subchannel> 1827 - This displays the subchannel information SCHIB block for the device. 1828 - this I believe is also only available to administrators. 1588 + This displays the subchannel information SCHIB block for the device. 1589 + this I believe is also only available to administrators. 1590 + 1829 1591 DEFINE CTC <devno> 1830 - defines a virtual CTC channel to channel connection 1831 - 2 need to be defined on each guest for the CTC driver to use. 1592 + defines a virtual CTC channel to channel connection 1593 + 2 need to be defined on each guest for the CTC driver to use. 1594 + 1832 1595 COUPLE devno userid remote devno 1833 - Joins a local virtual device to a remote virtual device 1834 - ( commonly used for the CTC driver ). 1596 + Joins a local virtual device to a remote virtual device 1597 + ( commonly used for the CTC driver ). 1835 1598 1836 - Building a VM ramdisk under CMS which linux can use 1837 - def vfb-<blocksize> <subchannel> <number blocks> 1599 + Building a VM ramdisk under CMS which linux can use:: 1600 + 1601 + def vfb-<blocksize> <subchannel> <number blocks> 1602 + 1838 1603 blocksize is commonly 4096 for linux. 1839 - Formatting it 1840 - format <subchannel> <driver letter e.g. x> (blksize <blocksize> 1841 1604 1842 - Sharing a disk between multiple guests 1843 - LINK userid devno1 devno2 mode password 1605 + Formatting it:: 1606 + 1607 + format <subchannel> <driver letter e.g. x> (blksize <blocksize> 1608 + 1609 + Sharing a disk between multiple guests:: 1610 + 1611 + LINK userid devno1 devno2 mode password 1844 1612 1845 1613 1846 1614 1847 1615 GDB on S390 1848 1616 =========== 1849 - N.B. if compiling for debugging gdb works better without optimisation 1617 + N.B. if compiling for debugging gdb works better without optimisation 1850 1618 ( see Compiling programs for debugging ) 1851 1619 1852 1620 invocation ··· 1871 1609 Online help 1872 1610 ----------- 1873 1611 help: gives help on commands 1874 - e.g. 1875 - help 1876 - help display 1612 + 1613 + e.g.:: 1614 + 1615 + help 1616 + help display 1617 + 1877 1618 Note gdb's online help is very good use it. 1878 1619 1879 1620 1880 1621 Assembly 1881 1622 -------- 1882 - info registers: displays registers other than floating point. 1883 - info all-registers: displays floating points as well. 1884 - disassemble: disassembles 1885 - e.g. 1886 - disassemble without parameters will disassemble the current function 1887 - disassemble $pc $pc+10 1623 + info registers: 1624 + displays registers other than floating point. 1625 + 1626 + info all-registers: 1627 + displays floating points as well. 1628 + 1629 + disassemble: 1630 + disassembles 1631 + 1632 + e.g.:: 1633 + 1634 + disassemble without parameters will disassemble the current function 1635 + disassemble $pc $pc+10 1888 1636 1889 1637 Viewing & modifying variables 1890 1638 ----------------------------- 1891 - print or p: displays variable or register 1639 + print or p: 1640 + displays variable or register 1641 + 1892 1642 e.g. p/x $sp will display the stack pointer 1893 1643 1894 - display: prints variable or register each time program stops 1895 - e.g. 1896 - display/x $pc will display the program counter 1897 - display argc 1644 + display: 1645 + prints variable or register each time program stops 1898 1646 1899 - undisplay : undo's display's 1647 + e.g.:: 1900 1648 1901 - info breakpoints: shows all current breakpoints 1649 + display/x $pc will display the program counter 1650 + display argc 1902 1651 1903 - info stack: shows stack back trace (if this doesn't work too well, I'll show 1904 - you the stacktrace by hand below). 1652 + undisplay: 1653 + undo's display's 1905 1654 1906 - info locals: displays local variables. 1655 + info breakpoints: 1656 + shows all current breakpoints 1907 1657 1908 - info args: display current procedure arguments. 1658 + info stack: 1659 + shows stack back trace (if this doesn't work too well, I'll show 1660 + you the stacktrace by hand below). 1909 1661 1910 - set args: will set argc & argv each time the victim program is invoked. 1662 + info locals: 1663 + displays local variables. 1911 1664 1912 - set <variable>=value 1913 - set argc=100 1914 - set $pc=0 1665 + info args: 1666 + display current procedure arguments. 1667 + 1668 + set args: 1669 + will set argc & argv each time the victim program is invoked 1670 + 1671 + e.g.:: 1672 + 1673 + set <variable>=value 1674 + set argc=100 1675 + set $pc=0 1915 1676 1916 1677 1917 1678 1918 1679 Modifying execution 1919 1680 ------------------- 1920 - step: steps n lines of sourcecode 1921 - step steps 1 line. 1922 - step 100 steps 100 lines of code. 1681 + step: 1682 + steps n lines of sourcecode 1923 1683 1924 - next: like step except this will not step into subroutines 1684 + step 1685 + steps 1 line. 1925 1686 1926 - stepi: steps a single machine code instruction. 1927 - e.g. stepi 100 1687 + step 100 1688 + steps 100 lines of code. 1928 1689 1929 - nexti: steps a single machine code instruction but will not step into 1930 - subroutines. 1690 + next: 1691 + like step except this will not step into subroutines 1931 1692 1932 - finish: will run until exit of the current routine 1693 + stepi: 1694 + steps a single machine code instruction. 1933 1695 1934 - run: (re)starts a program 1696 + e.g.:: 1935 1697 1936 - cont: continues a program 1698 + stepi 100 1937 1699 1938 - quit: exits gdb. 1700 + nexti: 1701 + steps a single machine code instruction but will not step into 1702 + subroutines. 1703 + 1704 + finish: 1705 + will run until exit of the current routine 1706 + 1707 + run: 1708 + (re)starts a program 1709 + 1710 + cont: 1711 + continues a program 1712 + 1713 + quit: 1714 + exits gdb. 1939 1715 1940 1716 1941 1717 breakpoints 1942 1718 ------------ 1943 1719 1944 1720 break 1945 - sets a breakpoint 1946 - e.g. 1721 + sets a breakpoint 1947 1722 1948 - break main 1723 + e.g.:: 1949 1724 1950 - break *$pc 1951 - 1952 - break *0x400618 1725 + break main 1726 + break *$pc 1727 + break *0x400618 1953 1728 1954 1729 Here's a really useful one for large programs 1730 + 1955 1731 rbr 1956 - Set a breakpoint for all functions matching REGEXP 1957 - e.g. 1958 - rbr 390 1732 + Set a breakpoint for all functions matching REGEXP 1733 + 1734 + e.g.:: 1735 + 1736 + rbr 390 1737 + 1959 1738 will set a breakpoint with all functions with 390 in their name. 1960 1739 1961 1740 info breakpoints 1962 - lists all breakpoints 1741 + lists all breakpoints 1963 1742 1964 - delete: delete breakpoint by number or delete them all 1743 + delete: 1744 + delete breakpoint by number or delete them all 1745 + 1965 1746 e.g. 1966 - delete 1 will delete the first breakpoint 1967 - delete will delete them all 1968 1747 1969 - watch: This will set a watchpoint ( usually hardware assisted ), 1748 + delete 1 1749 + will delete the first breakpoint 1750 + 1751 + 1752 + delete 1753 + will delete them all 1754 + 1755 + watch: 1756 + This will set a watchpoint ( usually hardware assisted ), 1757 + 1970 1758 This will watch a variable till it changes 1759 + 1971 1760 e.g. 1972 - watch cnt, will watch the variable cnt till it changes. 1761 + 1762 + watch cnt 1763 + will watch the variable cnt till it changes. 1764 + 1973 1765 As an aside unfortunately gdb's, architecture independent watchpoint code 1974 1766 is inconsistent & not very good, watchpoints usually work but not always. 1975 1767 1976 - info watchpoints: Display currently active watchpoints 1768 + info watchpoints: 1769 + Display currently active watchpoints 1977 1770 1978 1771 condition: ( another useful one ) 1979 - Specify breakpoint number N to break only if COND is true. 1980 - Usage is `condition N COND', where N is an integer and COND is an 1772 + Specify breakpoint number N to break only if COND is true. 1773 + 1774 + Usage is `condition N COND`, where N is an integer and COND is an 1981 1775 expression to be evaluated whenever breakpoint N is reached. 1982 1776 1983 1777 ··· 2041 1723 User defined functions/macros 2042 1724 ----------------------------- 2043 1725 define: ( Note this is very very useful,simple & powerful ) 1726 + 2044 1727 usage define <name> <list of commands> end 2045 1728 2046 - examples which you should consider putting into .gdbinit in your home directory 2047 - define d 2048 - stepi 2049 - disassemble $pc $pc+10 2050 - end 1729 + examples which you should consider putting into .gdbinit in your home 1730 + directory:: 2051 1731 2052 - define e 2053 - nexti 2054 - disassemble $pc $pc+10 2055 - end 1732 + define d 1733 + stepi 1734 + disassemble $pc $pc+10 1735 + end 1736 + define e 1737 + nexti 1738 + disassemble $pc $pc+10 1739 + end 2056 1740 2057 1741 2058 1742 Other hard to classify stuff 2059 1743 ---------------------------- 2060 1744 signal n: 2061 - sends the victim program a signal. 2062 - e.g. signal 3 will send a SIGQUIT. 1745 + sends the victim program a signal. 1746 + 1747 + e.g. `signal 3` will send a SIGQUIT. 2063 1748 2064 1749 info signals: 2065 - what gdb does when the victim receives certain signals. 1750 + what gdb does when the victim receives certain signals. 2066 1751 2067 1752 list: 2068 - e.g. 2069 - list lists current function source 2070 - list 1,10 list first 10 lines of current file. 1753 + 1754 + e.g.: 1755 + 1756 + list 1757 + lists current function source 1758 + list 1,10 1759 + list first 10 lines of current file. 1760 + 2071 1761 list test.c:1,10 2072 1762 2073 1763 2074 1764 directory: 2075 - Adds directories to be searched for source if gdb cannot find the source. 2076 - (note it is a bit sensitive about slashes) 2077 - e.g. To add the root of the filesystem to the searchpath do 2078 - directory // 1765 + Adds directories to be searched for source if gdb cannot find the source. 1766 + (note it is a bit sensitive about slashes) 1767 + 1768 + e.g. To add the root of the filesystem to the searchpath do:: 1769 + 1770 + directory // 2079 1771 2080 1772 2081 1773 call <function> ··· 2093 1765 e.g. 2094 1766 (gdb) call printf("hello world") 2095 1767 outputs: 2096 - $1 = 11 1768 + $1 = 11 2097 1769 2098 1770 You might now be thinking that the line above didn't work, something extra had 2099 1771 to be done. 2100 1772 (gdb) call fflush(stdout) 2101 1773 hello world$2 = 0 2102 - As an aside the debugger also calls malloc & free under the hood 1774 + As an aside the debugger also calls malloc & free under the hood 2103 1775 to make space for the "hello world" string. 2104 1776 2105 1777 2106 1778 2107 1779 hints 2108 1780 ----- 2109 - 1) command completion works just like bash 2110 - ( if you are a bad typist like me this really helps ) 1781 + 1) command completion works just like bash 1782 + ( if you are a bad typist like me this really helps ) 1783 + 2111 1784 e.g. hit br <TAB> & cursor up & down :-). 2112 1785 2113 1786 2) if you have a debugging problem that takes a few steps to recreate 2114 1787 put the steps into a file called .gdbinit in your current working directory 2115 - if you have defined a few extra useful user defined commands put these in 1788 + if you have defined a few extra useful user defined commands put these in 2116 1789 your home directory & they will be read each time gdb is launched. 2117 1790 2118 - A typical .gdbinit file might be. 2119 - break main 2120 - run 2121 - break runtime_exception 2122 - cont 1791 + A typical .gdbinit file might be.:: 1792 + 1793 + break main 1794 + run 1795 + break runtime_exception 1796 + cont 2123 1797 2124 1798 2125 1799 stack chaining in gdb by hand 2126 1800 ----------------------------- 2127 - This is done using a the same trick described for VM 2128 - p/x (*($sp+56))&0x7fffffff get the first backchain. 1801 + This is done using a the same trick described for VM:: 1802 + 1803 + p/x (*($sp+56))&0x7fffffff 1804 + 1805 + get the first backchain. 2129 1806 2130 1807 For z/Architecture 2131 1808 Replace 56 with 112 & ignore the &0x7fffffff 2132 1809 in the macros below & do nasty casts to longs like the following 2133 1810 as gdb unfortunately deals with printed arguments as ints which 2134 1811 messes up everything. 2135 - i.e. here is a 3rd backchain dereference 2136 - p/x *(long *)(***(long ***)$sp+112) 1812 + 1813 + i.e. here is a 3rd backchain dereference:: 1814 + 1815 + p/x *(long *)(***(long ***)$sp+112) 2137 1816 2138 1817 2139 - this outputs 2140 - $5 = 0x528f18 1818 + this outputs:: 1819 + 1820 + $5 = 0x528f18 1821 + 2141 1822 on my machine. 2142 - Now you can use 2143 - info symbol (*($sp+56))&0x7fffffff 2144 - you might see something like. 2145 - rl_getc + 36 in section .text telling you what is located at address 0x528f18 2146 - Now do. 2147 - p/x (*(*$sp+56))&0x7fffffff 2148 - This outputs 2149 - $6 = 0x528ed0 2150 - Now do. 2151 - info symbol (*(*$sp+56))&0x7fffffff 2152 - rl_read_key + 180 in section .text 2153 - now do 2154 - p/x (*(**$sp+56))&0x7fffffff 1823 + 1824 + Now you can use:: 1825 + 1826 + info symbol (*($sp+56))&0x7fffffff 1827 + 1828 + you might see something like:: 1829 + 1830 + rl_getc + 36 in section .text 1831 + 1832 + telling you what is located at address 0x528f18 1833 + Now do:: 1834 + 1835 + p/x (*(*$sp+56))&0x7fffffff 1836 + 1837 + This outputs:: 1838 + 1839 + $6 = 0x528ed0 1840 + 1841 + Now do:: 1842 + 1843 + info symbol (*(*$sp+56))&0x7fffffff 1844 + rl_read_key + 180 in section .text 1845 + 1846 + now do:: 1847 + 1848 + p/x (*(**$sp+56))&0x7fffffff 1849 + 2155 1850 & so on. 2156 1851 2157 1852 Disassembling instructions without debug info 2158 1853 --------------------------------------------- 2159 1854 gdb typically complains if there is a lack of debugging 2160 - symbols in the disassemble command with 1855 + symbols in the disassemble command with 2161 1856 "No function contains specified address." To get around 2162 - this do 2163 - x/<number lines to disassemble>xi <address> 2164 - e.g. 2165 - x/20xi 0x400730 1857 + this do:: 1858 + 1859 + x/<number lines to disassemble>xi <address> 1860 + 1861 + e.g.:: 1862 + 1863 + x/20xi 0x400730 2166 1864 2167 1865 2168 1866 2169 - Note: Remember gdb has history just like bash you don't need to retype the 2170 - whole line just use the up & down arrows. 1867 + Note: 1868 + Remember gdb has history just like bash you don't need to retype the 1869 + whole line just use the up & down arrows. 2171 1870 2172 1871 2173 1872 2174 1873 For more info 2175 1874 ------------- 2176 - From your linuxbox do 2177 - man gdb or info gdb. 1875 + From your linuxbox do:: 1876 + 1877 + man gdb 1878 + 1879 + or:: 1880 + 1881 + info gdb. 2178 1882 2179 1883 core dumps 2180 1884 ---------- 2181 - What a core dump ?, 1885 + 1886 + What a core dump ? 1887 + ^^^^^^^^^^^^^^^^^^ 1888 + 2182 1889 A core dump is a file generated by the kernel (if allowed) which contains the 2183 1890 registers and all active pages of the program which has crashed. 1891 + 2184 1892 From this file gdb will allow you to look at the registers, stack trace and 2185 1893 memory of the program as if it just crashed on your system. It is usually 2186 1894 called core and created in the current working directory. 1895 + 2187 1896 This is very useful in that a customer can mail a core dump to a technical 2188 1897 support department and the technical support department can reconstruct what 2189 1898 happened. Provided they have an identical copy of this program with debugging 2190 1899 symbols compiled in and the source base of this build is available. 1900 + 2191 1901 In short it is far more useful than something like a crash log could ever hope 2192 1902 to be. 2193 1903 2194 - Why have I never seen one ?. 2195 - Probably because you haven't used the command 2196 - ulimit -c unlimited in bash 2197 - to allow core dumps, now do 2198 - ulimit -a 1904 + Why have I never seen one ? 1905 + ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1906 + 1907 + Probably because you haven't used the command:: 1908 + 1909 + ulimit -c unlimited in bash 1910 + 1911 + to allow core dumps, now do:: 1912 + 1913 + ulimit -a 1914 + 2199 1915 to verify that the limit was accepted. 2200 1916 2201 1917 A sample core dump 2202 - To create this I'm going to do 2203 - ulimit -c unlimited 2204 - gdb 2205 - to launch gdb (my victim app. ) now be bad & do the following from another 2206 - telnet/xterm session to the same machine 2207 - ps -aux | grep gdb 2208 - kill -SIGSEGV <gdb's pid> 2209 - or alternatively use killall -SIGSEGV gdb if you have the killall command. 2210 - Now look at the core dump. 2211 - ./gdb core 2212 - Displays the following 2213 - GNU gdb 4.18 2214 - Copyright 1998 Free Software Foundation, Inc. 2215 - GDB is free software, covered by the GNU General Public License, and you are 2216 - welcome to change it and/or distribute copies of it under certain conditions. 2217 - Type "show copying" to see the conditions. 2218 - There is absolutely no warranty for GDB. Type "show warranty" for details. 2219 - This GDB was configured as "s390-ibm-linux"... 2220 - Core was generated by `./gdb'. 2221 - Program terminated with signal 11, Segmentation fault. 2222 - Reading symbols from /usr/lib/libncurses.so.4...done. 2223 - Reading symbols from /lib/libm.so.6...done. 2224 - Reading symbols from /lib/libc.so.6...done. 2225 - Reading symbols from /lib/ld-linux.so.2...done. 2226 - #0 0x40126d1a in read () from /lib/libc.so.6 2227 - Setting up the environment for debugging gdb. 2228 - Breakpoint 1 at 0x4dc6f8: file utils.c, line 471. 2229 - Breakpoint 2 at 0x4d87a4: file top.c, line 2609. 2230 - (top-gdb) info stack 2231 - #0 0x40126d1a in read () from /lib/libc.so.6 2232 - #1 0x528f26 in rl_getc (stream=0x7ffffde8) at input.c:402 2233 - #2 0x528ed0 in rl_read_key () at input.c:381 2234 - #3 0x5167e6 in readline_internal_char () at readline.c:454 2235 - #4 0x5168ee in readline_internal_charloop () at readline.c:507 2236 - #5 0x51692c in readline_internal () at readline.c:521 2237 - #6 0x5164fe in readline (prompt=0x7ffff810) 2238 - at readline.c:349 2239 - #7 0x4d7a8a in command_line_input (prompt=0x564420 "(gdb) ", repeat=1, 2240 - annotation_suffix=0x4d6b44 "prompt") at top.c:2091 2241 - #8 0x4d6cf0 in command_loop () at top.c:1345 2242 - #9 0x4e25bc in main (argc=1, argv=0x7ffffdf4) at main.c:635 1918 + To create this I'm going to do:: 1919 + 1920 + ulimit -c unlimited 1921 + gdb 1922 + 1923 + to launch gdb (my victim app. ) now be bad & do the following from another 1924 + telnet/xterm session to the same machine:: 1925 + 1926 + ps -aux | grep gdb 1927 + kill -SIGSEGV <gdb's pid> 1928 + 1929 + or alternatively use `killall -SIGSEGV gdb` if you have the killall command. 1930 + 1931 + Now look at the core dump:: 1932 + 1933 + ./gdb core 1934 + 1935 + Displays the following:: 1936 + 1937 + GNU gdb 4.18 1938 + Copyright 1998 Free Software Foundation, Inc. 1939 + GDB is free software, covered by the GNU General Public License, and you are 1940 + welcome to change it and/or distribute copies of it under certain conditions. 1941 + Type "show copying" to see the conditions. 1942 + There is absolutely no warranty for GDB. Type "show warranty" for details. 1943 + This GDB was configured as "s390-ibm-linux"... 1944 + Core was generated by `./gdb'. 1945 + Program terminated with signal 11, Segmentation fault. 1946 + Reading symbols from /usr/lib/libncurses.so.4...done. 1947 + Reading symbols from /lib/libm.so.6...done. 1948 + Reading symbols from /lib/libc.so.6...done. 1949 + Reading symbols from /lib/ld-linux.so.2...done. 1950 + #0 0x40126d1a in read () from /lib/libc.so.6 1951 + Setting up the environment for debugging gdb. 1952 + Breakpoint 1 at 0x4dc6f8: file utils.c, line 471. 1953 + Breakpoint 2 at 0x4d87a4: file top.c, line 2609. 1954 + (top-gdb) info stack 1955 + #0 0x40126d1a in read () from /lib/libc.so.6 1956 + #1 0x528f26 in rl_getc (stream=0x7ffffde8) at input.c:402 1957 + #2 0x528ed0 in rl_read_key () at input.c:381 1958 + #3 0x5167e6 in readline_internal_char () at readline.c:454 1959 + #4 0x5168ee in readline_internal_charloop () at readline.c:507 1960 + #5 0x51692c in readline_internal () at readline.c:521 1961 + #6 0x5164fe in readline (prompt=0x7ffff810) 1962 + at readline.c:349 1963 + #7 0x4d7a8a in command_line_input (prompt=0x564420 "(gdb) ", repeat=1, 1964 + annotation_suffix=0x4d6b44 "prompt") at top.c:2091 1965 + #8 0x4d6cf0 in command_loop () at top.c:1345 1966 + #9 0x4e25bc in main (argc=1, argv=0x7ffffdf4) at main.c:635 2243 1967 2244 1968 2245 1969 LDD ··· 2299 1919 This is a program which lists the shared libraries which a library needs, 2300 1920 Note you also get the relocations of the shared library text segments which 2301 1921 help when using objdump --source. 2302 - e.g. 2303 - ldd ./gdb 2304 - outputs 2305 - libncurses.so.4 => /usr/lib/libncurses.so.4 (0x40018000) 2306 - libm.so.6 => /lib/libm.so.6 (0x4005e000) 2307 - libc.so.6 => /lib/libc.so.6 (0x40084000) 2308 - /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) 1922 + 1923 + e.g.:: 1924 + 1925 + ldd ./gdb 1926 + 1927 + outputs:: 1928 + 1929 + libncurses.so.4 => /usr/lib/libncurses.so.4 (0x40018000) 1930 + libm.so.6 => /lib/libm.so.6 (0x4005e000) 1931 + libc.so.6 => /lib/libc.so.6 (0x40084000) 1932 + /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) 2309 1933 2310 1934 2311 1935 Debugging shared libraries 2312 1936 ========================== 2313 1937 Most programs use shared libraries, however it can be very painful 2314 - when you single step instruction into a function like printf for the 1938 + when you single step instruction into a function like printf for the 2315 1939 first time & you end up in functions like _dl_runtime_resolve this is 2316 - the ld.so doing lazy binding, lazy binding is a concept in ELF where 2317 - shared library functions are not loaded into memory unless they are 1940 + the ld.so doing lazy binding, lazy binding is a concept in ELF where 1941 + shared library functions are not loaded into memory unless they are 2318 1942 actually used, great for saving memory but a pain to debug. 2319 - To get around this either relink the program -static or exit gdb type 2320 - export LD_BIND_NOW=true this will stop lazy binding & restart the gdb'ing 1943 + 1944 + To get around this either relink the program -static or exit gdb type 1945 + export LD_BIND_NOW=true this will stop lazy binding & restart the gdb'ing 2321 1946 the program in question. 2322 - 1947 + 2323 1948 2324 1949 2325 1950 Debugging modules ··· 2340 1955 by the kernel if read, or can be used to modify kernel parameters, 2341 1956 it is a powerful concept. 2342 1957 2343 - e.g. 1958 + e.g.:: 2344 1959 2345 - cat /proc/sys/net/ipv4/ip_forward 2346 - On my machine outputs 2347 - 0 2348 - telling me ip_forwarding is not on to switch it on I can do 2349 - echo 1 > /proc/sys/net/ipv4/ip_forward 2350 - cat it again 2351 - cat /proc/sys/net/ipv4/ip_forward 2352 - On my machine now outputs 2353 - 1 1960 + cat /proc/sys/net/ipv4/ip_forward 1961 + 1962 + On my machine outputs:: 1963 + 1964 + 0 1965 + 1966 + telling me ip_forwarding is not on to switch it on I can do:: 1967 + 1968 + echo 1 > /proc/sys/net/ipv4/ip_forward 1969 + 1970 + cat it again:: 1971 + 1972 + cat /proc/sys/net/ipv4/ip_forward 1973 + 1974 + On my machine now outputs:: 1975 + 1976 + 1 1977 + 2354 1978 IP forwarding is on. 1979 + 2355 1980 There is a lot of useful info in here best found by going in and having a look 2356 1981 around, so I'll take you through some entries I consider important. 2357 1982 2358 1983 All the processes running on the machine have their own entry defined by 2359 1984 /proc/<pid> 2360 - So lets have a look at the init process 2361 - cd /proc/1 2362 1985 2363 - cat cmdline 2364 - emits 2365 - init [2] 1986 + So lets have a look at the init process:: 2366 1987 2367 - cd /proc/1/fd 1988 + cd /proc/1 1989 + cat cmdline 1990 + 1991 + emits:: 1992 + 1993 + init [2] 1994 + 1995 + :: 1996 + 1997 + cd /proc/1/fd 1998 + 2368 1999 This contains numerical entries of all the open files, 2369 - some of these you can cat e.g. stdout (2) 2000 + some of these you can cat e.g. stdout (2):: 2370 2001 2371 - cat /proc/29/maps 2372 - on my machine emits 2002 + cat /proc/29/maps 2373 2003 2374 - 00400000-00478000 r-xp 00000000 5f:00 4103 /bin/bash 2375 - 00478000-0047e000 rw-p 00077000 5f:00 4103 /bin/bash 2376 - 0047e000-00492000 rwxp 00000000 00:00 0 2377 - 40000000-40015000 r-xp 00000000 5f:00 14382 /lib/ld-2.1.2.so 2378 - 40015000-40016000 rw-p 00014000 5f:00 14382 /lib/ld-2.1.2.so 2379 - 40016000-40017000 rwxp 00000000 00:00 0 2380 - 40017000-40018000 rw-p 00000000 00:00 0 2381 - 40018000-4001b000 r-xp 00000000 5f:00 14435 /lib/libtermcap.so.2.0.8 2382 - 4001b000-4001c000 rw-p 00002000 5f:00 14435 /lib/libtermcap.so.2.0.8 2383 - 4001c000-4010d000 r-xp 00000000 5f:00 14387 /lib/libc-2.1.2.so 2384 - 4010d000-40111000 rw-p 000f0000 5f:00 14387 /lib/libc-2.1.2.so 2385 - 40111000-40114000 rw-p 00000000 00:00 0 2386 - 40114000-4011e000 r-xp 00000000 5f:00 14408 /lib/libnss_files-2.1.2.so 2387 - 4011e000-4011f000 rw-p 00009000 5f:00 14408 /lib/libnss_files-2.1.2.so 2388 - 7fffd000-80000000 rwxp ffffe000 00:00 0 2004 + on my machine emits:: 2005 + 2006 + 00400000-00478000 r-xp 00000000 5f:00 4103 /bin/bash 2007 + 00478000-0047e000 rw-p 00077000 5f:00 4103 /bin/bash 2008 + 0047e000-00492000 rwxp 00000000 00:00 0 2009 + 40000000-40015000 r-xp 00000000 5f:00 14382 /lib/ld-2.1.2.so 2010 + 40015000-40016000 rw-p 00014000 5f:00 14382 /lib/ld-2.1.2.so 2011 + 40016000-40017000 rwxp 00000000 00:00 0 2012 + 40017000-40018000 rw-p 00000000 00:00 0 2013 + 40018000-4001b000 r-xp 00000000 5f:00 14435 /lib/libtermcap.so.2.0.8 2014 + 4001b000-4001c000 rw-p 00002000 5f:00 14435 /lib/libtermcap.so.2.0.8 2015 + 4001c000-4010d000 r-xp 00000000 5f:00 14387 /lib/libc-2.1.2.so 2016 + 4010d000-40111000 rw-p 000f0000 5f:00 14387 /lib/libc-2.1.2.so 2017 + 40111000-40114000 rw-p 00000000 00:00 0 2018 + 40114000-4011e000 r-xp 00000000 5f:00 14408 /lib/libnss_files-2.1.2.so 2019 + 4011e000-4011f000 rw-p 00009000 5f:00 14408 /lib/libnss_files-2.1.2.so 2020 + 7fffd000-80000000 rwxp ffffe000 00:00 0 2389 2021 2390 2022 2391 2023 Showing us the shared libraries init uses where they are in memory 2392 2024 & memory access permissions for each virtual memory area. 2393 2025 2394 2026 /proc/1/cwd is a softlink to the current working directory. 2395 - /proc/1/root is the root of the filesystem for this process. 2027 + 2028 + /proc/1/root is the root of the filesystem for this process. 2396 2029 2397 2030 /proc/1/mem is the current running processes memory which you 2398 2031 can read & write to like a file. 2032 + 2399 2033 strace uses this sometimes as it is a bit faster than the 2400 2034 rather inefficient ptrace interface for peeking at DATA. 2401 2035 2036 + :: 2402 2037 2403 - cat status 2038 + cat status 2404 2039 2405 - Name: init 2406 - State: S (sleeping) 2407 - Pid: 1 2408 - PPid: 0 2409 - Uid: 0 0 0 0 2410 - Gid: 0 0 0 0 2411 - Groups: 2412 - VmSize: 408 kB 2413 - VmLck: 0 kB 2414 - VmRSS: 208 kB 2415 - VmData: 24 kB 2416 - VmStk: 8 kB 2417 - VmExe: 368 kB 2418 - VmLib: 0 kB 2419 - SigPnd: 0000000000000000 2420 - SigBlk: 0000000000000000 2421 - SigIgn: 7fffffffd7f0d8fc 2422 - SigCgt: 00000000280b2603 2423 - CapInh: 00000000fffffeff 2424 - CapPrm: 00000000ffffffff 2425 - CapEff: 00000000fffffeff 2040 + Name: init 2041 + State: S (sleeping) 2042 + Pid: 1 2043 + PPid: 0 2044 + Uid: 0 0 0 0 2045 + Gid: 0 0 0 0 2046 + Groups: 2047 + VmSize: 408 kB 2048 + VmLck: 0 kB 2049 + VmRSS: 208 kB 2050 + VmData: 24 kB 2051 + VmStk: 8 kB 2052 + VmExe: 368 kB 2053 + VmLib: 0 kB 2054 + SigPnd: 0000000000000000 2055 + SigBlk: 0000000000000000 2056 + SigIgn: 7fffffffd7f0d8fc 2057 + SigCgt: 00000000280b2603 2058 + CapInh: 00000000fffffeff 2059 + CapPrm: 00000000ffffffff 2060 + CapEff: 00000000fffffeff 2426 2061 2427 - User PSW: 070de000 80414146 2428 - task: 004b6000 tss: 004b62d8 ksp: 004b7ca8 pt_regs: 004b7f68 2429 - User GPRS: 2430 - 00000400 00000000 0000000b 7ffffa90 2431 - 00000000 00000000 00000000 0045d9f4 2432 - 0045cafc 7ffffa90 7fffff18 0045cb08 2433 - 00010400 804039e8 80403af8 7ffff8b0 2434 - User ACRS: 2435 - 00000000 00000000 00000000 00000000 2436 - 00000001 00000000 00000000 00000000 2437 - 00000000 00000000 00000000 00000000 2438 - 00000000 00000000 00000000 00000000 2439 - Kernel BackChain CallChain BackChain CallChain 2440 - 004b7ca8 8002bd0c 004b7d18 8002b92c 2441 - 004b7db8 8005cd50 004b7e38 8005d12a 2442 - 004b7f08 80019114 2062 + User PSW: 070de000 80414146 2063 + task: 004b6000 tss: 004b62d8 ksp: 004b7ca8 pt_regs: 004b7f68 2064 + User GPRS: 2065 + 00000400 00000000 0000000b 7ffffa90 2066 + 00000000 00000000 00000000 0045d9f4 2067 + 0045cafc 7ffffa90 7fffff18 0045cb08 2068 + 00010400 804039e8 80403af8 7ffff8b0 2069 + User ACRS: 2070 + 00000000 00000000 00000000 00000000 2071 + 00000001 00000000 00000000 00000000 2072 + 00000000 00000000 00000000 00000000 2073 + 00000000 00000000 00000000 00000000 2074 + Kernel BackChain CallChain BackChain CallChain 2075 + 004b7ca8 8002bd0c 004b7d18 8002b92c 2076 + 004b7db8 8005cd50 004b7e38 8005d12a 2077 + 004b7f08 80019114 2078 + 2443 2079 Showing among other things memory usage & status of some signals & 2444 2080 the processes'es registers from the kernel task_structure 2445 2081 as well as a backchain which may be useful if a process crashes ··· 2473 2067 Some of our drivers now support a "debug feature" in 2474 2068 /proc/s390dbf see s390dbf.txt in the linux/Documentation directory 2475 2069 for more info. 2476 - e.g. 2477 - to switch on the lcs "debug feature" 2478 - echo 5 > /proc/s390dbf/lcs/level 2479 - & then after the error occurred. 2480 - cat /proc/s390dbf/lcs/sprintf >/logfile 2070 + 2071 + e.g. 2072 + to switch on the lcs "debug feature":: 2073 + 2074 + echo 5 > /proc/s390dbf/lcs/level 2075 + 2076 + & then after the error occurred:: 2077 + 2078 + cat /proc/s390dbf/lcs/sprintf >/logfile 2079 + 2481 2080 the logfile now contains some information which may help 2482 2081 tech support resolve a problem in the field. 2483 2082 ··· 2494 2083 it gives the current state of network drivers. 2495 2084 2496 2085 If you suspect your network device driver is dead 2497 - one way to check is type 2498 - ifconfig <network device> 2086 + one way to check is type:: 2087 + 2088 + ifconfig <network device> 2089 + 2499 2090 e.g. tr0 2500 - You should see something like 2501 - tr0 Link encap:16/4 Mbps Token Ring (New) HWaddr 00:04:AC:20:8E:48 2502 - inet addr:9.164.185.132 Bcast:9.164.191.255 Mask:255.255.224.0 2503 - UP BROADCAST RUNNING MULTICAST MTU:2000 Metric:1 2504 - RX packets:246134 errors:0 dropped:0 overruns:0 frame:0 2505 - TX packets:5 errors:0 dropped:0 overruns:0 carrier:0 2506 - collisions:0 txqueuelen:100 2091 + 2092 + You should see something like:: 2093 + 2094 + ifconfig tr0 2095 + tr0 Link encap:16/4 Mbps Token Ring (New) HWaddr 00:04:AC:20:8E:48 2096 + inet addr:9.164.185.132 Bcast:9.164.191.255 Mask:255.255.224.0 2097 + UP BROADCAST RUNNING MULTICAST MTU:2000 Metric:1 2098 + RX packets:246134 errors:0 dropped:0 overruns:0 frame:0 2099 + TX packets:5 errors:0 dropped:0 overruns:0 carrier:0 2100 + collisions:0 txqueuelen:100 2507 2101 2508 2102 if the device doesn't say up 2509 - try 2510 - /etc/rc.d/init.d/network start 2103 + try:: 2104 + 2105 + /etc/rc.d/init.d/network start 2106 + 2511 2107 ( this starts the network stack & hopefully calls ifconfig tr0 up ). 2512 2108 ifconfig looks at the output of /proc/net/dev and presents it in a more 2513 2109 presentable form. 2110 + 2514 2111 Now ping the device from a machine in the same subnet. 2112 + 2515 2113 if the RX packets count & TX packets counts don't increment you probably 2516 2114 have problems. 2517 - next 2518 - cat /proc/net/arp 2115 + 2116 + next:: 2117 + 2118 + cat /proc/net/arp 2119 + 2519 2120 Do you see any hardware addresses in the cache if not you may have problems. 2520 - Next try 2521 - ping -c 5 <broadcast_addr> i.e. the Bcast field above in the output of 2121 + Next try:: 2122 + 2123 + ping -c 5 <broadcast_addr> 2124 + 2125 + i.e. the Bcast field above in the output of 2522 2126 ifconfig. Do you see any replies from machines other than the local machine 2523 2127 if not you may have problems. also if the TX packets count in ifconfig 2524 - hasn't incremented either you have serious problems in your driver 2525 - (e.g. the txbusy field of the network device being stuck on ) 2128 + hasn't incremented either you have serious problems in your driver 2129 + (e.g. the txbusy field of the network device being stuck on ) 2526 2130 or you may have multiple network devices connected. 2527 2131 2528 2132 ··· 2545 2119 ------- 2546 2120 There is a new device layer for channel devices, some 2547 2121 drivers e.g. lcs are registered with this layer. 2122 + 2548 2123 If the device uses the channel device layer you'll be 2549 - able to find what interrupts it uses & the current state 2124 + able to find what interrupts it uses & the current state 2550 2125 of the device. 2126 + 2551 2127 See the manpage chandev.8 &type cat /proc/chandev for more info. 2552 2128 2553 2129 2554 2130 SysRq 2555 2131 ===== 2556 2132 This is now supported by linux for s/390 & z/Architecture. 2557 - To enable it do compile the kernel with 2558 - Kernel Hacking -> Magic SysRq Key Enabled 2559 - echo "1" > /proc/sys/kernel/sysrq 2560 - also type 2561 - echo "8" >/proc/sys/kernel/printk 2133 + 2134 + To enable it do compile the kernel with:: 2135 + 2136 + Kernel Hacking -> Magic SysRq Key Enabled 2137 + 2138 + Then:: 2139 + 2140 + echo "1" > /proc/sys/kernel/sysrq 2141 + 2142 + also type:: 2143 + 2144 + echo "8" >/proc/sys/kernel/printk 2145 + 2562 2146 To make printk output go to console. 2563 - On 390 all commands are prefixed with 2564 - ^- 2565 - e.g. 2566 - ^-t will show tasks. 2567 - ^-? or some unknown command will display help. 2147 + 2148 + On 390 all commands are prefixed with:: 2149 + 2150 + ^- 2151 + 2152 + e.g.:: 2153 + 2154 + ^-t will show tasks. 2155 + ^-? or some unknown command will display help. 2156 + 2568 2157 The sysrq key reading is very picky ( I have to type the keys in an 2569 - xterm session & paste them into the x3270 console ) 2158 + xterm session & paste them into the x3270 console ) 2570 2159 & it may be wise to predefine the keys as described in the VM hints above 2571 2160 2572 2161 This is particularly useful for syncing disks unmounting & rebooting ··· 2591 2150 2592 2151 References: 2593 2152 =========== 2594 - Enterprise Systems Architecture Reference Summary 2595 - Enterprise Systems Architecture Principles of Operation 2596 - Hartmut Penners s390 stack frame sheet. 2597 - IBM Mainframe Channel Attachment a technology brief from a CISCO webpage 2598 - Various bits of man & info pages of Linux. 2599 - Linux & GDB source. 2600 - Various info & man pages. 2601 - CMS Help on tracing commands. 2602 - Linux for s/390 Elf Application Binary Interface 2603 - Linux for z/Series Elf Application Binary Interface ( Both Highly Recommended ) 2604 - z/Architecture Principles of Operation SA22-7832-00 2605 - Enterprise Systems Architecture/390 Reference Summary SA22-7209-01 & the 2606 - Enterprise Systems Architecture/390 Principles of Operation SA22-7201-05 2153 + - Enterprise Systems Architecture Reference Summary 2154 + - Enterprise Systems Architecture Principles of Operation 2155 + - Hartmut Penners s390 stack frame sheet. 2156 + - IBM Mainframe Channel Attachment a technology brief from a CISCO webpage 2157 + - Various bits of man & info pages of Linux. 2158 + - Linux & GDB source. 2159 + - Various info & man pages. 2160 + - CMS Help on tracing commands. 2161 + - Linux for s/390 Elf Application Binary Interface 2162 + - Linux for z/Series Elf Application Binary Interface ( Both Highly Recommended ) 2163 + - z/Architecture Principles of Operation SA22-7832-00 2164 + - Enterprise Systems Architecture/390 Reference Summary SA22-7209-01 & the 2165 + - Enterprise Systems Architecture/390 Principles of Operation SA22-7201-05 2607 2166 2608 2167 Special Thanks 2609 2168 ==============
+206 -148
Documentation/s390/cds.txt Documentation/s390/cds.rst
··· 1 + =========================== 1 2 Linux for S/390 and zSeries 3 + =========================== 2 4 3 5 Common Device Support (CDS) 4 6 Device Driver I/O Support Routines 5 7 6 - Authors : Ingo Adlung 7 - Cornelia Huck 8 + Authors: 9 + - Ingo Adlung 10 + - Cornelia Huck 8 11 9 12 Copyright, IBM Corp. 1999-2002 10 13 11 14 Introduction 15 + ============ 12 16 13 17 This document describes the common device support routines for Linux/390. 14 18 Different than other hardware architectures, ESA/390 has defined a unified ··· 31 27 32 28 In order to build common device support for ESA/390 I/O interfaces, a 33 29 functional layer was introduced that provides generic I/O access methods to 34 - the hardware. 30 + the hardware. 35 31 36 - The common device support layer comprises the I/O support routines defined 37 - below. Some of them implement common Linux device driver interfaces, while 32 + The common device support layer comprises the I/O support routines defined 33 + below. Some of them implement common Linux device driver interfaces, while 38 34 some of them are ESA/390 platform specific. 39 35 40 36 Note: 41 - In order to write a driver for S/390, you also need to look into the interface 42 - described in Documentation/s390/driver-model.txt. 37 + In order to write a driver for S/390, you also need to look into the interface 38 + described in Documentation/s390/driver-model.rst. 43 39 44 40 Note for porting drivers from 2.4: 41 + 45 42 The major changes are: 43 + 46 44 * The functions use a ccw_device instead of an irq (subchannel). 47 45 * All drivers must define a ccw_driver (see driver-model.txt) and the associated 48 46 functions. ··· 63 57 ccw_device_get_ciw() 64 58 get commands from extended sense data. 65 59 66 - ccw_device_start() 67 - ccw_device_start_timeout() 68 - ccw_device_start_key() 69 - ccw_device_start_key_timeout() 60 + ccw_device_start(), ccw_device_start_timeout(), ccw_device_start_key(), ccw_device_start_key_timeout() 70 61 initiate an I/O request. 71 62 72 63 ccw_device_resume() 73 64 resume channel program execution. 74 65 75 - ccw_device_halt() 66 + ccw_device_halt() 76 67 terminate the current I/O request processed on the device. 77 68 78 - do_IRQ() 69 + do_IRQ() 79 70 generic interrupt routine. This function is called by the interrupt entry 80 71 routine whenever an I/O interrupt is presented to the system. The do_IRQ() 81 72 routine determines the interrupt status and calls the device specific ··· 85 82 callable interface. Instead, the functional description of do_IO() also 86 83 describes the input to the device specific interrupt handler. 87 84 88 - Note: All explanations apply also to the 64 bit architecture s390x. 85 + Note: 86 + All explanations apply also to the 64 bit architecture s390x. 89 87 90 88 91 89 Common Device Support (CDS) for Linux/390 Device Drivers 90 + ======================================================== 92 91 93 92 General Information 93 + ------------------- 94 94 95 95 The following chapters describe the I/O related interface routines the 96 96 Linux/390 common device support (CDS) provides to allow for device specific ··· 107 101 linux/arch/s390/include/asm/irq.h. 108 102 109 103 Overview of CDS interface concepts 104 + ---------------------------------- 110 105 111 106 Different to other hardware platforms, the ESA/390 architecture doesn't define 112 107 interrupt lines managed by a specific interrupt controller and bus systems ··· 133 126 determine the device driver owning the device that raised the interrupt. 134 127 135 128 Up to kernel 2.4, Linux/390 used to provide interfaces via the IRQ (subchannel). 136 - For internal use of the common I/O layer, these are still there. However, 129 + For internal use of the common I/O layer, these are still there. However, 137 130 device drivers should use the new calling interface via the ccw_device only. 138 131 139 132 During its startup the Linux/390 system checks for peripheral devices. Each ··· 141 134 channel subsystem. While the subchannel numbers are system generated, each 142 135 subchannel also takes a user defined attribute, the so called device number. 143 136 Both subchannel number and device number cannot exceed 65535. During sysfs 144 - initialisation, the information about control unit type and device types that 137 + initialisation, the information about control unit type and device types that 145 138 imply specific I/O commands (channel command words - CCWs) in order to operate 146 139 the device are gathered. Device drivers can retrieve this set of hardware 147 140 information during their initialization step to recognize the devices they ··· 171 164 This call enables a device driver to get information about supported commands 172 165 from the extended SenseID data. 173 166 174 - struct ciw * 175 - ccw_device_get_ciw(struct ccw_device *cdev, __u32 cmd); 167 + :: 176 168 177 - cdev - The ccw_device for which the command is to be retrieved. 178 - cmd - The command type to be retrieved. 169 + struct ciw * 170 + ccw_device_get_ciw(struct ccw_device *cdev, __u32 cmd); 171 + 172 + ==== ======================================================== 173 + cdev The ccw_device for which the command is to be retrieved. 174 + cmd The command type to be retrieved. 175 + ==== ======================================================== 179 176 180 177 ccw_device_get_ciw() returns: 181 - NULL - No extended data available, invalid device or command not found. 182 - !NULL - The command requested. 183 178 179 + ===== ================================================================ 180 + NULL No extended data available, invalid device or command not found. 181 + !NULL The command requested. 182 + ===== ================================================================ 184 183 185 - ccw_device_start() - Initiate I/O Request 184 + :: 185 + 186 + ccw_device_start() - Initiate I/O Request 186 187 187 188 The ccw_device_start() routines is the I/O request front-end processor. All 188 189 device driver I/O requests must be issued using this routine. A device driver ··· 201 186 driver's interrupt handler as this is related to the rules (flags) defined 202 187 with the associated I/O request when calling ccw_device_start(). 203 188 204 - int ccw_device_start(struct ccw_device *cdev, 205 - struct ccw1 *cpa, 206 - unsigned long intparm, 207 - __u8 lpm, 208 - unsigned long flags); 209 - int ccw_device_start_timeout(struct ccw_device *cdev, 210 - struct ccw1 *cpa, 211 - unsigned long intparm, 212 - __u8 lpm, 213 - unsigned long flags, 214 - int expires); 215 - int ccw_device_start_key(struct ccw_device *cdev, 216 - struct ccw1 *cpa, 217 - unsigned long intparm, 218 - __u8 lpm, 219 - __u8 key, 220 - unsigned long flags); 221 - int ccw_device_start_key_timeout(struct ccw_device *cdev, 222 - struct ccw1 *cpa, 223 - unsigned long intparm, 224 - __u8 lpm, 225 - __u8 key, 226 - unsigned long flags, 227 - int expires); 189 + :: 228 190 229 - cdev : ccw_device the I/O is destined for 230 - cpa : logical start address of channel program 231 - user_intparm : user specific interrupt information; will be presented 232 - back to the device driver's interrupt handler. Allows a 233 - device driver to associate the interrupt with a 234 - particular I/O request. 235 - lpm : defines the channel path to be used for a specific I/O 236 - request. A value of 0 will make cio use the opm. 237 - key : the storage key to use for the I/O (useful for operating on a 238 - storage with a storage key != default key) 239 - flag : defines the action to be performed for I/O processing 240 - expires : timeout value in jiffies. The common I/O layer will terminate 241 - the running program after this and call the interrupt handler 242 - with ERR_PTR(-ETIMEDOUT) as irb. 191 + int ccw_device_start(struct ccw_device *cdev, 192 + struct ccw1 *cpa, 193 + unsigned long intparm, 194 + __u8 lpm, 195 + unsigned long flags); 196 + int ccw_device_start_timeout(struct ccw_device *cdev, 197 + struct ccw1 *cpa, 198 + unsigned long intparm, 199 + __u8 lpm, 200 + unsigned long flags, 201 + int expires); 202 + int ccw_device_start_key(struct ccw_device *cdev, 203 + struct ccw1 *cpa, 204 + unsigned long intparm, 205 + __u8 lpm, 206 + __u8 key, 207 + unsigned long flags); 208 + int ccw_device_start_key_timeout(struct ccw_device *cdev, 209 + struct ccw1 *cpa, 210 + unsigned long intparm, 211 + __u8 lpm, 212 + __u8 key, 213 + unsigned long flags, 214 + int expires); 243 215 244 - Possible flag values are : 216 + ============= ============================================================= 217 + cdev ccw_device the I/O is destined for 218 + cpa logical start address of channel program 219 + user_intparm user specific interrupt information; will be presented 220 + back to the device driver's interrupt handler. Allows a 221 + device driver to associate the interrupt with a 222 + particular I/O request. 223 + lpm defines the channel path to be used for a specific I/O 224 + request. A value of 0 will make cio use the opm. 225 + key the storage key to use for the I/O (useful for operating on a 226 + storage with a storage key != default key) 227 + flag defines the action to be performed for I/O processing 228 + expires timeout value in jiffies. The common I/O layer will terminate 229 + the running program after this and call the interrupt handler 230 + with ERR_PTR(-ETIMEDOUT) as irb. 231 + ============= ============================================================= 245 232 246 - DOIO_ALLOW_SUSPEND - channel program may become suspended 247 - DOIO_DENY_PREFETCH - don't allow for CCW prefetch; usually 248 - this implies the channel program might 249 - become modified 250 - DOIO_SUPPRESS_INTER - don't call the handler on intermediate status 233 + Possible flag values are: 251 234 252 - The cpa parameter points to the first format 1 CCW of a channel program : 235 + ========================= ============================================= 236 + DOIO_ALLOW_SUSPEND channel program may become suspended 237 + DOIO_DENY_PREFETCH don't allow for CCW prefetch; usually 238 + this implies the channel program might 239 + become modified 240 + DOIO_SUPPRESS_INTER don't call the handler on intermediate status 241 + ========================= ============================================= 253 242 254 - struct ccw1 { 255 - __u8 cmd_code;/* command code */ 256 - __u8 flags; /* flags, like IDA addressing, etc. */ 257 - __u16 count; /* byte count */ 258 - __u32 cda; /* data address */ 259 - } __attribute__ ((packed,aligned(8))); 243 + The cpa parameter points to the first format 1 CCW of a channel program:: 260 244 261 - with the following CCW flags values defined : 245 + struct ccw1 { 246 + __u8 cmd_code;/* command code */ 247 + __u8 flags; /* flags, like IDA addressing, etc. */ 248 + __u16 count; /* byte count */ 249 + __u32 cda; /* data address */ 250 + } __attribute__ ((packed,aligned(8))); 262 251 263 - CCW_FLAG_DC - data chaining 264 - CCW_FLAG_CC - command chaining 265 - CCW_FLAG_SLI - suppress incorrect length 266 - CCW_FLAG_SKIP - skip 267 - CCW_FLAG_PCI - PCI 268 - CCW_FLAG_IDA - indirect addressing 269 - CCW_FLAG_SUSPEND - suspend 252 + with the following CCW flags values defined: 253 + 254 + =================== ========================= 255 + CCW_FLAG_DC data chaining 256 + CCW_FLAG_CC command chaining 257 + CCW_FLAG_SLI suppress incorrect length 258 + CCW_FLAG_SKIP skip 259 + CCW_FLAG_PCI PCI 260 + CCW_FLAG_IDA indirect addressing 261 + CCW_FLAG_SUSPEND suspend 262 + =================== ========================= 270 263 271 264 272 265 Via ccw_device_set_options(), the device driver may specify the following 273 266 options for the device: 274 267 275 - DOIO_EARLY_NOTIFICATION - allow for early interrupt notification 276 - DOIO_REPORT_ALL - report all interrupt conditions 268 + ========================= ====================================== 269 + DOIO_EARLY_NOTIFICATION allow for early interrupt notification 270 + DOIO_REPORT_ALL report all interrupt conditions 271 + ========================= ====================================== 277 272 278 273 279 - The ccw_device_start() function returns : 274 + The ccw_device_start() function returns: 280 275 281 - 0 - successful completion or request successfully initiated 282 - -EBUSY - The device is currently processing a previous I/O request, or there is 283 - a status pending at the device. 284 - -ENODEV - cdev is invalid, the device is not operational or the ccw_device is 285 - not online. 276 + ======== ====================================================================== 277 + 0 successful completion or request successfully initiated 278 + -EBUSY The device is currently processing a previous I/O request, or there is 279 + a status pending at the device. 280 + -ENODEV cdev is invalid, the device is not operational or the ccw_device is 281 + not online. 282 + ======== ====================================================================== 286 283 287 284 When the I/O request completes, the CDS first level interrupt handler will 288 285 accumulate the status in a struct irb and then call the device interrupt handler. 289 - The intparm field will contain the value the device driver has associated with a 290 - particular I/O request. If a pending device status was recognized, 286 + The intparm field will contain the value the device driver has associated with a 287 + particular I/O request. If a pending device status was recognized, 291 288 intparm will be set to 0 (zero). This may happen during I/O initiation or delayed 292 289 by an alert status notification. In any case this status is not related to the 293 290 current (last) I/O request. In case of a delayed status notification no special ··· 309 282 The irb may contain an error value, and the device driver should check for this 310 283 first: 311 284 312 - -ETIMEDOUT: the common I/O layer terminated the request after the specified 313 - timeout value 314 - -EIO: the common I/O layer terminated the request due to an error state 285 + ========== ================================================================= 286 + -ETIMEDOUT the common I/O layer terminated the request after the specified 287 + timeout value 288 + -EIO the common I/O layer terminated the request due to an error state 289 + ========== ================================================================= 315 290 316 291 If the concurrent sense flag in the extended status word (esw) in the irb is 317 292 set, the field erw.scnt in the esw describes the number of device specific ··· 323 294 The device interrupt handler can use the following definitions to investigate 324 295 the primary unit check source coded in sense byte 0 : 325 296 297 + ======================= ==== 326 298 SNS0_CMD_REJECT 0x80 327 299 SNS0_INTERVENTION_REQ 0x40 328 300 SNS0_BUS_OUT_CHECK 0x20 ··· 331 301 SNS0_DATA_CHECK 0x08 332 302 SNS0_OVERRUN 0x04 333 303 SNS0_INCOMPL_DOMAIN 0x01 304 + ======================= ==== 334 305 335 306 Depending on the device status, multiple of those values may be set together. 336 307 Please refer to the device specific documentation for details. 337 308 338 309 The irb->scsw.cstat field provides the (accumulated) subchannel status : 339 310 340 - SCHN_STAT_PCI - program controlled interrupt 341 - SCHN_STAT_INCORR_LEN - incorrect length 342 - SCHN_STAT_PROG_CHECK - program check 343 - SCHN_STAT_PROT_CHECK - protection check 344 - SCHN_STAT_CHN_DATA_CHK - channel data check 345 - SCHN_STAT_CHN_CTRL_CHK - channel control check 346 - SCHN_STAT_INTF_CTRL_CHK - interface control check 347 - SCHN_STAT_CHAIN_CHECK - chaining check 311 + ========================= ============================ 312 + SCHN_STAT_PCI program controlled interrupt 313 + SCHN_STAT_INCORR_LEN incorrect length 314 + SCHN_STAT_PROG_CHECK program check 315 + SCHN_STAT_PROT_CHECK protection check 316 + SCHN_STAT_CHN_DATA_CHK channel data check 317 + SCHN_STAT_CHN_CTRL_CHK channel control check 318 + SCHN_STAT_INTF_CTRL_CHK interface control check 319 + SCHN_STAT_CHAIN_CHECK chaining check 320 + ========================= ============================ 348 321 349 322 The irb->scsw.dstat field provides the (accumulated) device status : 350 323 351 - DEV_STAT_ATTENTION - attention 352 - DEV_STAT_STAT_MOD - status modifier 353 - DEV_STAT_CU_END - control unit end 354 - DEV_STAT_BUSY - busy 355 - DEV_STAT_CHN_END - channel end 356 - DEV_STAT_DEV_END - device end 357 - DEV_STAT_UNIT_CHECK - unit check 358 - DEV_STAT_UNIT_EXCEP - unit exception 324 + ===================== ================= 325 + DEV_STAT_ATTENTION attention 326 + DEV_STAT_STAT_MOD status modifier 327 + DEV_STAT_CU_END control unit end 328 + DEV_STAT_BUSY busy 329 + DEV_STAT_CHN_END channel end 330 + DEV_STAT_DEV_END device end 331 + DEV_STAT_UNIT_CHECK unit check 332 + DEV_STAT_UNIT_EXCEP unit exception 333 + ===================== ================= 359 334 360 335 Please see the ESA/390 Principles of Operation manual for details on the 361 336 individual flag meanings. 362 337 363 - Usage Notes : 338 + Usage Notes: 364 339 365 340 ccw_device_start() must be called disabled and with the ccw device lock held. 366 341 ··· 409 374 successful completion for all overlapping ccw_device_start() requests that have 410 375 been issued since the last secondary (final) status. 411 376 412 - Channel programs that intend to set the suspend flag on a channel command word 413 - (CCW) must start the I/O operation with the DOIO_ALLOW_SUSPEND option or the 414 - suspend flag will cause a channel program check. At the time the channel program 415 - becomes suspended an intermediate interrupt will be generated by the channel 377 + Channel programs that intend to set the suspend flag on a channel command word 378 + (CCW) must start the I/O operation with the DOIO_ALLOW_SUSPEND option or the 379 + suspend flag will cause a channel program check. At the time the channel program 380 + becomes suspended an intermediate interrupt will be generated by the channel 416 381 subsystem. 417 382 418 - ccw_device_resume() - Resume Channel Program Execution 383 + ccw_device_resume() - Resume Channel Program Execution 419 384 420 - If a device driver chooses to suspend the current channel program execution by 421 - setting the CCW suspend flag on a particular CCW, the channel program execution 422 - is suspended. In order to resume channel program execution the CIO layer 423 - provides the ccw_device_resume() routine. 385 + If a device driver chooses to suspend the current channel program execution by 386 + setting the CCW suspend flag on a particular CCW, the channel program execution 387 + is suspended. In order to resume channel program execution the CIO layer 388 + provides the ccw_device_resume() routine. 424 389 425 - int ccw_device_resume(struct ccw_device *cdev); 390 + :: 426 391 427 - cdev - ccw_device the resume operation is requested for 392 + int ccw_device_resume(struct ccw_device *cdev); 393 + 394 + ==== ================================================ 395 + cdev ccw_device the resume operation is requested for 396 + ==== ================================================ 428 397 429 398 The ccw_device_resume() function returns: 430 399 431 - 0 - suspended channel program is resumed 432 - -EBUSY - status pending 433 - -ENODEV - cdev invalid or not-operational subchannel 434 - -EINVAL - resume function not applicable 435 - -ENOTCONN - there is no I/O request pending for completion 400 + ========= ============================================== 401 + 0 suspended channel program is resumed 402 + -EBUSY status pending 403 + -ENODEV cdev invalid or not-operational subchannel 404 + -EINVAL resume function not applicable 405 + -ENOTCONN there is no I/O request pending for completion 406 + ========= ============================================== 436 407 437 408 Usage Notes: 409 + 438 410 Please have a look at the ccw_device_start() usage notes for more details on 439 411 suspended channel programs. 440 412 ··· 454 412 455 413 ccw_device_halt() must be called disabled and with the ccw device lock held. 456 414 457 - int ccw_device_halt(struct ccw_device *cdev, 458 - unsigned long intparm); 415 + :: 459 416 460 - cdev : ccw_device the halt operation is requested for 461 - intparm : interruption parameter; value is only used if no I/O 462 - is outstanding, otherwise the intparm associated with 463 - the I/O request is returned 417 + int ccw_device_halt(struct ccw_device *cdev, 418 + unsigned long intparm); 464 419 465 - The ccw_device_halt() function returns : 420 + ======= ===================================================== 421 + cdev ccw_device the halt operation is requested for 422 + intparm interruption parameter; value is only used if no I/O 423 + is outstanding, otherwise the intparm associated with 424 + the I/O request is returned 425 + ======= ===================================================== 466 426 467 - 0 - request successfully initiated 468 - -EBUSY - the device is currently busy, or status pending. 469 - -ENODEV - cdev invalid. 470 - -EINVAL - The device is not operational or the ccw device is not online. 427 + The ccw_device_halt() function returns: 471 428 472 - Usage Notes : 429 + ======= ============================================================== 430 + 0 request successfully initiated 431 + -EBUSY the device is currently busy, or status pending. 432 + -ENODEV cdev invalid. 433 + -EINVAL The device is not operational or the ccw device is not online. 434 + ======= ============================================================== 435 + 436 + Usage Notes: 473 437 474 438 A device driver may write a never-ending channel program by writing a channel 475 439 program that at its end loops back to its beginning by means of a transfer in ··· 486 438 read to a network device (with or without PCI flag) a ccw_device_halt() 487 439 is required to end the pending operation. 488 440 489 - ccw_device_clear() - Terminage I/O Request Processing 441 + :: 442 + 443 + ccw_device_clear() - Terminage I/O Request Processing 490 444 491 445 In order to terminate all I/O processing at the subchannel, the clear subchannel 492 446 (CSCH) command is used. It can be issued via ccw_device_clear(). 493 447 494 448 ccw_device_clear() must be called disabled and with the ccw device lock held. 495 449 496 - int ccw_device_clear(struct ccw_device *cdev, unsigned long intparm); 450 + :: 497 451 498 - cdev: ccw_device the clear operation is requested for 499 - intparm: interruption parameter (see ccw_device_halt()) 452 + int ccw_device_clear(struct ccw_device *cdev, unsigned long intparm); 453 + 454 + ======= =============================================== 455 + cdev ccw_device the clear operation is requested for 456 + intparm interruption parameter (see ccw_device_halt()) 457 + ======= =============================================== 500 458 501 459 The ccw_device_clear() function returns: 502 460 503 - 0 - request successfully initiated 504 - -ENODEV - cdev invalid 505 - -EINVAL - The device is not operational or the ccw device is not online. 461 + ======= ============================================================== 462 + 0 request successfully initiated 463 + -ENODEV cdev invalid 464 + -EINVAL The device is not operational or the ccw device is not online. 465 + ======= ============================================================== 506 466 507 467 Miscellaneous Support Routines 468 + ------------------------------ 508 469 509 470 This chapter describes various routines to be used in a Linux/390 device 510 471 driver programming environment. ··· 523 466 Get the address of the device specific lock. This is then used in 524 467 spin_lock() / spin_unlock() calls. 525 468 469 + :: 526 470 527 - __u8 ccw_device_get_path_mask(struct ccw_device *cdev); 471 + __u8 ccw_device_get_path_mask(struct ccw_device *cdev); 528 472 529 473 Get the mask of the path currently available for cdev.
+110 -69
Documentation/s390/driver-model.txt Documentation/s390/driver-model.rst
··· 1 + ============================= 1 2 S/390 driver model interfaces 2 - ----------------------------- 3 + ============================= 3 4 4 5 1. CCW devices 5 6 -------------- ··· 8 7 All devices which can be addressed by means of ccws are called 'CCW devices' - 9 8 even if they aren't actually driven by ccws. 10 9 11 - All ccw devices are accessed via a subchannel, this is reflected in the 12 - structures under devices/: 10 + All ccw devices are accessed via a subchannel, this is reflected in the 11 + structures under devices/:: 13 12 14 - devices/ 13 + devices/ 15 14 - system/ 16 15 - css0/ 17 - - 0.0.0000/0.0.0815/ 16 + - 0.0.0000/0.0.0815/ 18 17 - 0.0.0001/0.0.4711/ 19 18 - 0.0.0002/ 20 19 - 0.1.0000/0.1.1234/ ··· 36 35 37 36 All ccw devices export some data via sysfs. 38 37 39 - cutype: The control unit type / model. 38 + cutype: 39 + The control unit type / model. 40 40 41 - devtype: The device type / model, if applicable. 41 + devtype: 42 + The device type / model, if applicable. 42 43 43 - availability: Can be 'good' or 'boxed'; 'no path' or 'no device' for 44 + availability: 45 + Can be 'good' or 'boxed'; 'no path' or 'no device' for 44 46 disconnected devices. 45 47 46 - online: An interface to set the device online and offline. 48 + online: 49 + An interface to set the device online and offline. 47 50 In the special case of the device being disconnected (see the 48 51 notify function under 1.2), piping 0 to online will forcibly delete 49 52 the device. ··· 57 52 There is also some data exported on a per-subchannel basis (see under 58 53 bus/css/devices/): 59 54 60 - chpids: Via which chpids the device is connected. 55 + chpids: 56 + Via which chpids the device is connected. 61 57 62 - pimpampom: The path installed, path available and path operational masks. 58 + pimpampom: 59 + The path installed, path available and path operational masks. 63 60 64 61 There also might be additional data, for example for block devices. 65 62 ··· 81 74 ------------------------------------ 82 75 83 76 The basic struct ccw_device and struct ccw_driver data structures can be found 84 - under include/asm/ccwdev.h. 77 + under include/asm/ccwdev.h:: 85 78 86 - struct ccw_device { 87 - spinlock_t *ccwlock; 88 - struct ccw_device_private *private; 89 - struct ccw_device_id id; 79 + struct ccw_device { 80 + spinlock_t *ccwlock; 81 + struct ccw_device_private *private; 82 + struct ccw_device_id id; 90 83 91 - struct ccw_driver *drv; 92 - struct device dev; 84 + struct ccw_driver *drv; 85 + struct device dev; 93 86 int online; 94 87 95 88 void (*handler) (struct ccw_device *dev, unsigned long intparm, 96 - struct irb *irb); 97 - }; 89 + struct irb *irb); 90 + }; 98 91 99 - struct ccw_driver { 100 - struct module *owner; 101 - struct ccw_device_id *ids; 102 - int (*probe) (struct ccw_device *); 92 + struct ccw_driver { 93 + struct module *owner; 94 + struct ccw_device_id *ids; 95 + int (*probe) (struct ccw_device *); 103 96 int (*remove) (struct ccw_device *); 104 97 int (*set_online) (struct ccw_device *); 105 98 int (*set_offline) (struct ccw_device *); 106 99 int (*notify) (struct ccw_device *, int); 107 100 struct device_driver driver; 108 101 char *name; 109 - }; 102 + }; 110 103 111 104 The 'private' field contains data needed for internal i/o operation only, and 112 105 is not available to the device driver. 113 106 114 107 Each driver should declare in a MODULE_DEVICE_TABLE into which CU types/models 115 108 and/or device types/models it is interested. This information can later be found 116 - in the struct ccw_device_id fields: 109 + in the struct ccw_device_id fields:: 117 110 118 - struct ccw_device_id { 119 - __u16 match_flags; 111 + struct ccw_device_id { 112 + __u16 match_flags; 120 113 121 - __u16 cu_type; 122 - __u16 dev_type; 123 - __u8 cu_model; 124 - __u8 dev_model; 114 + __u16 cu_type; 115 + __u16 dev_type; 116 + __u8 cu_model; 117 + __u8 dev_model; 125 118 126 119 unsigned long driver_info; 127 - }; 120 + }; 128 121 129 122 The functions in ccw_driver should be used in the following way: 130 - probe: This function is called by the device layer for each device the driver 123 + 124 + probe: 125 + This function is called by the device layer for each device the driver 131 126 is interested in. The driver should only allocate private structures 132 127 to put in dev->driver_data and create attributes (if needed). Also, 133 128 the interrupt handler (see below) should be set here. 134 129 135 - int (*probe) (struct ccw_device *cdev); 130 + :: 136 131 137 - Parameters: cdev - the device to be probed. 132 + int (*probe) (struct ccw_device *cdev); 133 + 134 + Parameters: 135 + cdev 136 + - the device to be probed. 138 137 139 138 140 - remove: This function is called by the device layer upon removal of the driver, 139 + remove: 140 + This function is called by the device layer upon removal of the driver, 141 141 the device or the module. The driver should perform cleanups here. 142 142 143 - int (*remove) (struct ccw_device *cdev); 143 + :: 144 144 145 - Parameters: cdev - the device to be removed. 145 + int (*remove) (struct ccw_device *cdev); 146 + 147 + Parameters: 148 + cdev 149 + - the device to be removed. 146 150 147 151 148 - set_online: This function is called by the common I/O layer when the device is 152 + set_online: 153 + This function is called by the common I/O layer when the device is 149 154 activated via the 'online' attribute. The driver should finally 150 155 setup and activate the device here. 151 156 152 - int (*set_online) (struct ccw_device *); 157 + :: 153 158 154 - Parameters: cdev - the device to be activated. The common layer has 159 + int (*set_online) (struct ccw_device *); 160 + 161 + Parameters: 162 + cdev 163 + - the device to be activated. The common layer has 155 164 verified that the device is not already online. 156 165 157 166 ··· 175 152 de-activated via the 'online' attribute. The driver should shut 176 153 down the device, but not de-allocate its private data. 177 154 178 - int (*set_offline) (struct ccw_device *); 155 + :: 179 156 180 - Parameters: cdev - the device to be deactivated. The common layer has 157 + int (*set_offline) (struct ccw_device *); 158 + 159 + Parameters: 160 + cdev 161 + - the device to be deactivated. The common layer has 181 162 verified that the device is online. 182 163 183 164 184 - notify: This function is called by the common I/O layer for some state changes 165 + notify: 166 + This function is called by the common I/O layer for some state changes 185 167 of the device. 168 + 186 169 Signalled to the driver are: 170 + 187 171 * In online state, device detached (CIO_GONE) or last path gone 188 172 (CIO_NO_PATH). The driver must return !0 to keep the device; for 189 173 return code 0, the device will be deleted as usual (also when no ··· 203 173 return code of the notify function the device driver signals if it 204 174 wants the device back: !0 for keeping, 0 to make the device being 205 175 removed and re-registered. 206 - 207 - int (*notify) (struct ccw_device *, int); 208 176 209 - Parameters: cdev - the device whose state changed. 210 - event - the event that happened. This can be one of CIO_GONE, 211 - CIO_NO_PATH or CIO_OPER. 177 + :: 178 + 179 + int (*notify) (struct ccw_device *, int); 180 + 181 + Parameters: 182 + cdev 183 + - the device whose state changed. 184 + 185 + event 186 + - the event that happened. This can be one of CIO_GONE, 187 + CIO_NO_PATH or CIO_OPER. 212 188 213 189 The handler field of the struct ccw_device is meant to be set to the interrupt 214 - handler for the device. In order to accommodate drivers which use several 190 + handler for the device. In order to accommodate drivers which use several 215 191 distinct handlers (e.g. multi subchannel devices), this is a member of ccw_device 216 192 instead of ccw_driver. 217 193 The handler is registered with the common layer during set_online() processing 218 194 before the driver is called, and is deregistered during set_offline() after the 219 - driver has been called. Also, after registering / before deregistering, path 195 + driver has been called. Also, after registering / before deregistering, path 220 196 grouping resp. disbanding of the path group (if applicable) are performed. 221 197 222 - void (*handler) (struct ccw_device *dev, unsigned long intparm, struct irb *irb); 198 + :: 223 199 224 - Parameters: dev - the device the handler is called for 200 + void (*handler) (struct ccw_device *dev, unsigned long intparm, struct irb *irb); 201 + 202 + Parameters: dev - the device the handler is called for 225 203 intparm - the intparm which allows the device driver to identify 226 - the i/o the interrupt is associated with, or to recognize 227 - the interrupt as unsolicited. 228 - irb - interruption response block which contains the accumulated 229 - status. 204 + the i/o the interrupt is associated with, or to recognize 205 + the interrupt as unsolicited. 206 + irb - interruption response block which contains the accumulated 207 + status. 230 208 231 - The device driver is called from the common ccw_device layer and can retrieve 209 + The device driver is called from the common ccw_device layer and can retrieve 232 210 information about the interrupt from the irb parameter. 233 211 234 212 ··· 275 237 latter consistently due to lacking machine support (we don't need to be aware 276 238 of it anyway). 277 239 278 - status - Can be 'online' or 'offline'. 240 + status 241 + - Can be 'online' or 'offline'. 279 242 Piping 'on' or 'off' sets the chpid logically online/offline. 280 243 Piping 'on' to an online chpid triggers path reprobing for all devices 281 244 the chpid connects to. This can be used to force the kernel to re-use 282 245 a channel path the user knows to be online, but the machine hasn't 283 246 created a machine check for. 284 247 285 - type - The physical type of the channel path. 248 + type 249 + - The physical type of the channel path. 286 250 287 - shared - Whether the channel path is shared. 251 + shared 252 + - Whether the channel path is shared. 288 253 289 - cmg - The channel measurement group. 254 + cmg 255 + - The channel measurement group. 290 256 291 257 3. System devices 292 258 ----------------- 293 259 294 - 3.1 xpram 260 + 3.1 xpram 295 261 --------- 296 262 297 263 xpram shows up under devices/system/ as 'xpram'. ··· 321 279 number is assigned sequentially to the connections defined via the 'connection' 322 280 attribute. 323 281 324 - user - shows the connection partner. 282 + user 283 + - shows the connection partner. 325 284 326 - buffer - maximum buffer size. 327 - Pipe to it to change buffer size. 328 - 329 - 285 + buffer 286 + - maximum buffer size. Pipe to it to change buffer size.
+30
Documentation/s390/index.rst
··· 1 + :orphan: 2 + 3 + ================= 4 + s390 Architecture 5 + ================= 6 + 7 + .. toctree:: 8 + :maxdepth: 1 9 + 10 + cds 11 + 3270 12 + debugging390 13 + driver-model 14 + monreader 15 + qeth 16 + s390dbf 17 + vfio-ap 18 + vfio-ccw 19 + zfcpdump 20 + dasd 21 + common_io 22 + 23 + text_files 24 + 25 + .. only:: subproject and html 26 + 27 + Indices 28 + ======= 29 + 30 + * :ref:`genindex`
+50 -35
Documentation/s390/monreader.txt Documentation/s390/monreader.rst
··· 1 + ================================================= 2 + Linux API for read access to z/VM Monitor Records 3 + ================================================= 1 4 2 5 Date : 2004-Nov-26 6 + 3 7 Author: Gerald Schaefer (geraldsc@de.ibm.com) 4 8 5 9 6 - Linux API for read access to z/VM Monitor Records 7 - ================================================= 8 10 9 11 10 12 Description 11 13 =========== 12 14 This item delivers a new Linux API in the form of a misc char device that is 13 15 usable from user space and allows read access to the z/VM Monitor Records 14 - collected by the *MONITOR System Service of z/VM. 16 + collected by the `*MONITOR` System Service of z/VM. 15 17 16 18 17 19 User Requirements 18 20 ================= 19 21 The z/VM guest on which you want to access this API needs to be configured in 20 - order to allow IUCV connections to the *MONITOR service, i.e. it needs the 21 - IUCV *MONITOR statement in its user entry. If the monitor DCSS to be used is 22 + order to allow IUCV connections to the `*MONITOR` service, i.e. it needs the 23 + IUCV `*MONITOR` statement in its user entry. If the monitor DCSS to be used is 22 24 restricted (likely), you also need the NAMESAVE <DCSS NAME> statement. 23 25 This item will use the IUCV device driver to access the z/VM services, so you 24 26 need a kernel with IUCV support. You also need z/VM version 4.4 or 5.1. ··· 52 50 and you have to specify the "mem=" kernel parameter in your parmfile with a 53 51 value greater than the ending address of the DCSS. 54 52 55 - Example: DEF STOR 140M 53 + Example:: 54 + 55 + DEF STOR 140M 56 56 57 57 This defines 140MB storage size for your guest, the parameter "mem=160M" is 58 58 added to the parmfile. ··· 70 66 in the parmfile. 71 67 72 68 The default name for the DCSS is "MONDCSS" if none is specified. In case that 73 - there are other users already connected to the *MONITOR service (e.g. 69 + there are other users already connected to the `*MONITOR` service (e.g. 74 70 Performance Toolkit), the monitor DCSS is already defined and you have to use 75 71 the same DCSS. The CP command Q MONITOR (Class E privileged) shows the name 76 72 of the monitor DCSS, if already defined, and the users connected to the 77 - *MONITOR service. 73 + `*MONITOR` service. 78 74 Refer to the "z/VM Performance" book (SC24-6109-00) on how to create a monitor 79 75 DCSS if your z/VM doesn't have one already, you need Class E privileges to 80 76 define and save a DCSS. 81 77 82 78 Example: 83 79 -------- 84 - modprobe monreader mondcss=MYDCSS 80 + 81 + :: 82 + 83 + modprobe monreader mondcss=MYDCSS 85 84 86 85 This loads the module and sets the DCSS name to "MYDCSS". 87 86 88 87 NOTE: 89 88 ----- 90 - This API provides no interface to control the *MONITOR service, e.g. specify 89 + This API provides no interface to control the `*MONITOR` service, e.g. specify 91 90 which data should be collected. This can be done by the CP command MONITOR 92 91 (Class E privileged), see "CP Command and Utility Reference". 93 92 ··· 105 98 automatically and you have to create it manually after loading the module. 106 99 Therefore you need to know the major and minor numbers of the device. These 107 100 numbers can be found in /sys/class/misc/monreader/dev. 101 + 108 102 Typing cat /sys/class/misc/monreader/dev will give an output of the form 109 103 <major>:<minor>. The device node can be created via the mknod command, enter 110 104 mknod <name> c <major> <minor>, where <name> is the name of the device node ··· 113 105 114 106 Example: 115 107 -------- 116 - # modprobe monreader 117 - # cat /sys/class/misc/monreader/dev 118 - 10:63 119 - # mknod /dev/monreader c 10 63 108 + 109 + :: 110 + 111 + # modprobe monreader 112 + # cat /sys/class/misc/monreader/dev 113 + 10:63 114 + # mknod /dev/monreader c 10 63 120 115 121 116 This loads the module with the default monitor DCSS (MONDCSS) and creates a 122 117 device node. ··· 144 133 correctly (domain 1, record 13), i.e. it can be used to determine the record 145 134 start offset relative to a 4K page (frame) boundary. 146 135 147 - See "Appendix A: *MONITOR" in the "z/VM Performance" document for a description 136 + See "Appendix A: `*MONITOR`" in the "z/VM Performance" document for a description 148 137 of the monitor control element layout. The layout of the monitor records can 149 138 be found here (z/VM 5.1): http://www.vm.ibm.com/pubs/mon510/index.html 150 139 151 - The layout of the data stream provided by the monreader device is as follows: 152 - ... 153 - <0 byte read> 154 - <first MCE> \ 155 - <first set of records> | 156 - ... |- data set 157 - <last MCE> | 158 - <last set of records> / 159 - <0 byte read> 160 - ... 140 + The layout of the data stream provided by the monreader device is as follows:: 141 + 142 + ... 143 + <0 byte read> 144 + <first MCE> \ 145 + <first set of records> | 146 + ... |- data set 147 + <last MCE> | 148 + <last set of records> / 149 + <0 byte read> 150 + ... 161 151 162 152 There may be more than one combination of MCE and corresponding record set 163 153 within one data set and the end of each data set is indicated by a successful ··· 177 165 negative value for the number of bytes read. In this case, the errno variable 178 166 indicates the error condition: 179 167 180 - EIO: reply failed, read data is invalid and the application 168 + EIO: 169 + reply failed, read data is invalid and the application 181 170 should discard the data read since the last successful read with 0 size. 182 - EFAULT: copy_to_user failed, read data is invalid and the application should 183 - discard the data read since the last successful read with 0 size. 184 - EAGAIN: occurs on a non-blocking read if there is no data available at the 185 - moment. There is no data missing or corrupted, just try again or rather 186 - use polling for non-blocking reads. 187 - EOVERFLOW: message limit reached, the data read since the last successful 188 - read with 0 size is valid but subsequent records may be missing. 171 + EFAULT: 172 + copy_to_user failed, read data is invalid and the application should 173 + discard the data read since the last successful read with 0 size. 174 + EAGAIN: 175 + occurs on a non-blocking read if there is no data available at the 176 + moment. There is no data missing or corrupted, just try again or rather 177 + use polling for non-blocking reads. 178 + EOVERFLOW: 179 + message limit reached, the data read since the last successful 180 + read with 0 size is valid but subsequent records may be missing. 189 181 190 182 In the last case (EOVERFLOW) there may be missing data, in the first two cases 191 183 (EIO, EFAULT) there will be missing data. It's up to the application if it will ··· 199 183 ----- 200 184 Only one user is allowed to open the char device. If it is already in use, the 201 185 open function will fail (return a negative value) and set errno to EBUSY. 202 - The open function may also fail if an IUCV connection to the *MONITOR service 186 + The open function may also fail if an IUCV connection to the `*MONITOR` service 203 187 cannot be established. In this case errno will be set to EIO and an error 204 188 message with an IPUSER SEVER code will be printed into syslog. The IPUSER SEVER 205 189 codes are described in the "z/VM Performance" book, Appendix A. ··· 210 194 will account for the message limit, i.e. opening the device without reading 211 195 from it will provoke the "message limit reached" error (EOVERFLOW error code) 212 196 eventually. 213 -
+25 -11
Documentation/s390/qeth.txt Documentation/s390/qeth.rst
··· 1 + ============================= 1 2 IBM s390 QDIO Ethernet Driver 3 + ============================= 2 4 3 5 OSA and HiperSockets Bridge Port Support 6 + ======================================== 4 7 5 8 Uevents 9 + ------- 6 10 7 11 To generate the events the device must be assigned a role of either 8 12 a primary or a secondary Bridge Port. For more information, see ··· 17 13 event with ACTION=CHANGE is emitted on behalf of the corresponding 18 14 ccwgroup device. The event has the following attributes: 19 15 20 - BRIDGEPORT=statechange - indicates that the Bridge Port device changed 16 + BRIDGEPORT=statechange 17 + indicates that the Bridge Port device changed 21 18 its state. 22 19 23 - ROLE={primary|secondary|none} - the role assigned to the port. 20 + ROLE={primary|secondary|none} 21 + the role assigned to the port. 24 22 25 - STATE={active|standby|inactive} - the newly assumed state of the port. 23 + STATE={active|standby|inactive} 24 + the newly assumed state of the port. 26 25 27 26 When run on HiperSockets Bridge Capable Port hardware with host address 28 27 notifications enabled, a udev event with ACTION=CHANGE is emitted. ··· 33 26 or a VLAN is registered or unregistered on the network served by the device. 34 27 The event has the following attributes: 35 28 36 - BRIDGEDHOST={reset|register|deregister|abort} - host address 29 + BRIDGEDHOST={reset|register|deregister|abort} 30 + host address 37 31 notifications are started afresh, a new host or VLAN is registered or 38 32 deregistered on the Bridge Port HiperSockets channel, or address 39 33 notifications are aborted. 40 34 41 - VLAN=numeric-vlan-id - VLAN ID on which the event occurred. Not included 35 + VLAN=numeric-vlan-id 36 + VLAN ID on which the event occurred. Not included 42 37 if no VLAN is involved in the event. 43 38 44 - MAC=xx:xx:xx:xx:xx:xx - MAC address of the host that is being registered 39 + MAC=xx:xx:xx:xx:xx:xx 40 + MAC address of the host that is being registered 45 41 or deregistered from the HiperSockets channel. Not reported if the 46 42 event reports the creation or destruction of a VLAN. 47 43 48 - NTOK_BUSID=x.y.zzzz - device bus ID (CSSID, SSID and device number). 44 + NTOK_BUSID=x.y.zzzz 45 + device bus ID (CSSID, SSID and device number). 49 46 50 - NTOK_IID=xx - device IID. 47 + NTOK_IID=xx 48 + device IID. 51 49 52 - NTOK_CHPID=xx - device CHPID. 50 + NTOK_CHPID=xx 51 + device CHPID. 53 52 54 - NTOK_CHID=xxxx - device channel ID. 53 + NTOK_CHID=xxxx 54 + device channel ID. 55 55 56 - Note that the NTOK_* attributes refer to devices other than the one 56 + Note that the `NTOK_*` attributes refer to devices other than the one 57 57 connected to the system on which the OS is running.
+803
Documentation/s390/s390dbf.rst
··· 1 + ================== 2 + S390 Debug Feature 3 + ================== 4 + 5 + files: 6 + - arch/s390/kernel/debug.c 7 + - arch/s390/include/asm/debug.h 8 + 9 + Description: 10 + ------------ 11 + The goal of this feature is to provide a kernel debug logging API 12 + where log records can be stored efficiently in memory, where each component 13 + (e.g. device drivers) can have one separate debug log. 14 + One purpose of this is to inspect the debug logs after a production system crash 15 + in order to analyze the reason for the crash. 16 + 17 + If the system still runs but only a subcomponent which uses dbf fails, 18 + it is possible to look at the debug logs on a live system via the Linux 19 + debugfs filesystem. 20 + 21 + The debug feature may also very useful for kernel and driver development. 22 + 23 + Design: 24 + ------- 25 + Kernel components (e.g. device drivers) can register themselves at the debug 26 + feature with the function call debug_register(). This function initializes a 27 + debug log for the caller. For each debug log exists a number of debug areas 28 + where exactly one is active at one time. Each debug area consists of contiguous 29 + pages in memory. In the debug areas there are stored debug entries (log records) 30 + which are written by event- and exception-calls. 31 + 32 + An event-call writes the specified debug entry to the active debug 33 + area and updates the log pointer for the active area. If the end 34 + of the active debug area is reached, a wrap around is done (ring buffer) 35 + and the next debug entry will be written at the beginning of the active 36 + debug area. 37 + 38 + An exception-call writes the specified debug entry to the log and 39 + switches to the next debug area. This is done in order to be sure 40 + that the records which describe the origin of the exception are not 41 + overwritten when a wrap around for the current area occurs. 42 + 43 + The debug areas themselves are also ordered in form of a ring buffer. 44 + When an exception is thrown in the last debug area, the following debug 45 + entries are then written again in the very first area. 46 + 47 + There are three versions for the event- and exception-calls: One for 48 + logging raw data, one for text and one for numbers. 49 + 50 + Each debug entry contains the following data: 51 + 52 + - Timestamp 53 + - Cpu-Number of calling task 54 + - Level of debug entry (0...6) 55 + - Return Address to caller 56 + - Flag, if entry is an exception or not 57 + 58 + The debug logs can be inspected in a live system through entries in 59 + the debugfs-filesystem. Under the toplevel directory "s390dbf" there is 60 + a directory for each registered component, which is named like the 61 + corresponding component. The debugfs normally should be mounted to 62 + /sys/kernel/debug therefore the debug feature can be accessed under 63 + /sys/kernel/debug/s390dbf. 64 + 65 + The content of the directories are files which represent different views 66 + to the debug log. Each component can decide which views should be 67 + used through registering them with the function debug_register_view(). 68 + Predefined views for hex/ascii, sprintf and raw binary data are provided. 69 + It is also possible to define other views. The content of 70 + a view can be inspected simply by reading the corresponding debugfs file. 71 + 72 + All debug logs have an actual debug level (range from 0 to 6). 73 + The default level is 3. Event and Exception functions have a 'level' 74 + parameter. Only debug entries with a level that is lower or equal 75 + than the actual level are written to the log. This means, when 76 + writing events, high priority log entries should have a low level 77 + value whereas low priority entries should have a high one. 78 + The actual debug level can be changed with the help of the debugfs-filesystem 79 + through writing a number string "x" to the 'level' debugfs file which is 80 + provided for every debug log. Debugging can be switched off completely 81 + by using "-" on the 'level' debugfs file. 82 + 83 + Example:: 84 + 85 + > echo "-" > /sys/kernel/debug/s390dbf/dasd/level 86 + 87 + It is also possible to deactivate the debug feature globally for every 88 + debug log. You can change the behavior using 2 sysctl parameters in 89 + /proc/sys/s390dbf: 90 + 91 + There are currently 2 possible triggers, which stop the debug feature 92 + globally. The first possibility is to use the "debug_active" sysctl. If 93 + set to 1 the debug feature is running. If "debug_active" is set to 0 the 94 + debug feature is turned off. 95 + 96 + The second trigger which stops the debug feature is a kernel oops. 97 + That prevents the debug feature from overwriting debug information that 98 + happened before the oops. After an oops you can reactivate the debug feature 99 + by piping 1 to /proc/sys/s390dbf/debug_active. Nevertheless, its not 100 + suggested to use an oopsed kernel in a production environment. 101 + 102 + If you want to disallow the deactivation of the debug feature, you can use 103 + the "debug_stoppable" sysctl. If you set "debug_stoppable" to 0 the debug 104 + feature cannot be stopped. If the debug feature is already stopped, it 105 + will stay deactivated. 106 + 107 + ---------------------------------------------------------------------------- 108 + 109 + Kernel Interfaces: 110 + ------------------ 111 + 112 + :: 113 + 114 + debug_info_t *debug_register(char *name, int pages, int nr_areas, 115 + int buf_size); 116 + 117 + Parameter: 118 + name: 119 + Name of debug log (e.g. used for debugfs entry) 120 + pages: 121 + Number of pages, which will be allocated per area 122 + nr_areas: 123 + Number of debug areas 124 + buf_size: 125 + Size of data area in each debug entry 126 + 127 + Return Value: 128 + Handle for generated debug area 129 + 130 + NULL if register failed 131 + 132 + Description: Allocates memory for a debug log 133 + Must not be called within an interrupt handler 134 + 135 + ---------------------------------------------------------------------------- 136 + 137 + :: 138 + 139 + debug_info_t *debug_register_mode(char *name, int pages, int nr_areas, 140 + int buf_size, mode_t mode, uid_t uid, 141 + gid_t gid); 142 + 143 + Parameter: 144 + name: 145 + Name of debug log (e.g. used for debugfs entry) 146 + pages: 147 + Number of pages, which will be allocated per area 148 + nr_areas: 149 + Number of debug areas 150 + buf_size: 151 + Size of data area in each debug entry 152 + mode: 153 + File mode for debugfs files. E.g. S_IRWXUGO 154 + uid: 155 + User ID for debugfs files. Currently only 0 is 156 + supported. 157 + gid: 158 + Group ID for debugfs files. Currently only 0 is 159 + supported. 160 + 161 + Return Value: 162 + Handle for generated debug area 163 + 164 + NULL if register failed 165 + 166 + Description: 167 + Allocates memory for a debug log 168 + Must not be called within an interrupt handler 169 + 170 + --------------------------------------------------------------------------- 171 + 172 + :: 173 + 174 + void debug_unregister (debug_info_t * id); 175 + 176 + Parameter: 177 + id: 178 + handle for debug log 179 + 180 + Return Value: 181 + none 182 + 183 + Description: 184 + frees memory for a debug log and removes all registered debug 185 + views. 186 + 187 + Must not be called within an interrupt handler 188 + 189 + --------------------------------------------------------------------------- 190 + 191 + :: 192 + 193 + void debug_set_level (debug_info_t * id, int new_level); 194 + 195 + Parameter: id: handle for debug log 196 + new_level: new debug level 197 + 198 + Return Value: 199 + none 200 + 201 + Description: 202 + Sets new actual debug level if new_level is valid. 203 + 204 + --------------------------------------------------------------------------- 205 + 206 + :: 207 + 208 + bool debug_level_enabled (debug_info_t * id, int level); 209 + 210 + Parameter: 211 + id: 212 + handle for debug log 213 + level: 214 + debug level 215 + 216 + Return Value: 217 + True if level is less or equal to the current debug level. 218 + 219 + Description: 220 + Returns true if debug events for the specified level would be 221 + logged. Otherwise returns false. 222 + 223 + --------------------------------------------------------------------------- 224 + 225 + :: 226 + 227 + void debug_stop_all(void); 228 + 229 + Parameter: 230 + none 231 + 232 + Return Value: 233 + none 234 + 235 + Description: 236 + stops the debug feature if stopping is allowed. Currently 237 + used in case of a kernel oops. 238 + 239 + --------------------------------------------------------------------------- 240 + 241 + :: 242 + 243 + debug_entry_t* debug_event (debug_info_t* id, int level, void* data, 244 + int length); 245 + 246 + Parameter: 247 + id: 248 + handle for debug log 249 + level: 250 + debug level 251 + data: 252 + pointer to data for debug entry 253 + length: 254 + length of data in bytes 255 + 256 + Return Value: 257 + Address of written debug entry 258 + 259 + Description: 260 + writes debug entry to active debug area (if level <= actual 261 + debug level) 262 + 263 + --------------------------------------------------------------------------- 264 + 265 + :: 266 + 267 + debug_entry_t* debug_int_event (debug_info_t * id, int level, 268 + unsigned int data); 269 + debug_entry_t* debug_long_event(debug_info_t * id, int level, 270 + unsigned long data); 271 + 272 + Parameter: 273 + id: 274 + handle for debug log 275 + level: 276 + debug level 277 + data: 278 + integer value for debug entry 279 + 280 + Return Value: 281 + Address of written debug entry 282 + 283 + Description: 284 + writes debug entry to active debug area (if level <= actual 285 + debug level) 286 + 287 + --------------------------------------------------------------------------- 288 + 289 + :: 290 + 291 + debug_entry_t* debug_text_event (debug_info_t * id, int level, 292 + const char* data); 293 + 294 + Parameter: 295 + id: 296 + handle for debug log 297 + level: 298 + debug level 299 + data: 300 + string for debug entry 301 + 302 + Return Value: 303 + Address of written debug entry 304 + 305 + Description: 306 + writes debug entry in ascii format to active debug area 307 + (if level <= actual debug level) 308 + 309 + --------------------------------------------------------------------------- 310 + 311 + :: 312 + 313 + debug_entry_t* debug_sprintf_event (debug_info_t * id, int level, 314 + char* string,...); 315 + 316 + Parameter: 317 + id: 318 + handle for debug log 319 + level: 320 + debug level 321 + string: 322 + format string for debug entry 323 + ...: 324 + varargs used as in sprintf() 325 + 326 + Return Value: Address of written debug entry 327 + 328 + Description: 329 + writes debug entry with format string and varargs (longs) to 330 + active debug area (if level $<=$ actual debug level). 331 + floats and long long datatypes cannot be used as varargs. 332 + 333 + --------------------------------------------------------------------------- 334 + 335 + :: 336 + 337 + debug_entry_t* debug_exception (debug_info_t* id, int level, void* data, 338 + int length); 339 + 340 + Parameter: 341 + id: 342 + handle for debug log 343 + level: 344 + debug level 345 + data: 346 + pointer to data for debug entry 347 + length: 348 + length of data in bytes 349 + 350 + Return Value: 351 + Address of written debug entry 352 + 353 + Description: 354 + writes debug entry to active debug area (if level <= actual 355 + debug level) and switches to next debug area 356 + 357 + --------------------------------------------------------------------------- 358 + 359 + :: 360 + 361 + debug_entry_t* debug_int_exception (debug_info_t * id, int level, 362 + unsigned int data); 363 + debug_entry_t* debug_long_exception(debug_info_t * id, int level, 364 + unsigned long data); 365 + 366 + Parameter: id: handle for debug log 367 + level: debug level 368 + data: integer value for debug entry 369 + 370 + Return Value: Address of written debug entry 371 + 372 + Description: writes debug entry to active debug area (if level <= actual 373 + debug level) and switches to next debug area 374 + 375 + --------------------------------------------------------------------------- 376 + 377 + :: 378 + 379 + debug_entry_t* debug_text_exception (debug_info_t * id, int level, 380 + const char* data); 381 + 382 + Parameter: id: handle for debug log 383 + level: debug level 384 + data: string for debug entry 385 + 386 + Return Value: Address of written debug entry 387 + 388 + Description: writes debug entry in ascii format to active debug area 389 + (if level <= actual debug level) and switches to next debug 390 + area 391 + 392 + --------------------------------------------------------------------------- 393 + 394 + :: 395 + 396 + debug_entry_t* debug_sprintf_exception (debug_info_t * id, int level, 397 + char* string,...); 398 + 399 + Parameter: id: handle for debug log 400 + level: debug level 401 + string: format string for debug entry 402 + ...: varargs used as in sprintf() 403 + 404 + Return Value: Address of written debug entry 405 + 406 + Description: writes debug entry with format string and varargs (longs) to 407 + active debug area (if level $<=$ actual debug level) and 408 + switches to next debug area. 409 + floats and long long datatypes cannot be used as varargs. 410 + 411 + --------------------------------------------------------------------------- 412 + 413 + :: 414 + 415 + int debug_register_view (debug_info_t * id, struct debug_view *view); 416 + 417 + Parameter: id: handle for debug log 418 + view: pointer to debug view struct 419 + 420 + Return Value: 0 : ok 421 + < 0: Error 422 + 423 + Description: registers new debug view and creates debugfs dir entry 424 + 425 + --------------------------------------------------------------------------- 426 + 427 + :: 428 + 429 + int debug_unregister_view (debug_info_t * id, struct debug_view *view); 430 + 431 + Parameter: id: handle for debug log 432 + view: pointer to debug view struct 433 + 434 + Return Value: 0 : ok 435 + < 0: Error 436 + 437 + Description: unregisters debug view and removes debugfs dir entry 438 + 439 + 440 + 441 + Predefined views: 442 + ----------------- 443 + 444 + extern struct debug_view debug_hex_ascii_view; 445 + 446 + extern struct debug_view debug_raw_view; 447 + 448 + extern struct debug_view debug_sprintf_view; 449 + 450 + Examples 451 + -------- 452 + 453 + :: 454 + 455 + /* 456 + * hex_ascii- + raw-view Example 457 + */ 458 + 459 + #include <linux/init.h> 460 + #include <asm/debug.h> 461 + 462 + static debug_info_t* debug_info; 463 + 464 + static int init(void) 465 + { 466 + /* register 4 debug areas with one page each and 4 byte data field */ 467 + 468 + debug_info = debug_register ("test", 1, 4, 4 ); 469 + debug_register_view(debug_info,&debug_hex_ascii_view); 470 + debug_register_view(debug_info,&debug_raw_view); 471 + 472 + debug_text_event(debug_info, 4 , "one "); 473 + debug_int_exception(debug_info, 4, 4711); 474 + debug_event(debug_info, 3, &debug_info, 4); 475 + 476 + return 0; 477 + } 478 + 479 + static void cleanup(void) 480 + { 481 + debug_unregister (debug_info); 482 + } 483 + 484 + module_init(init); 485 + module_exit(cleanup); 486 + 487 + --------------------------------------------------------------------------- 488 + 489 + :: 490 + 491 + /* 492 + * sprintf-view Example 493 + */ 494 + 495 + #include <linux/init.h> 496 + #include <asm/debug.h> 497 + 498 + static debug_info_t* debug_info; 499 + 500 + static int init(void) 501 + { 502 + /* register 4 debug areas with one page each and data field for */ 503 + /* format string pointer + 2 varargs (= 3 * sizeof(long)) */ 504 + 505 + debug_info = debug_register ("test", 1, 4, sizeof(long) * 3); 506 + debug_register_view(debug_info,&debug_sprintf_view); 507 + 508 + debug_sprintf_event(debug_info, 2 , "first event in %s:%i\n",__FILE__,__LINE__); 509 + debug_sprintf_exception(debug_info, 1, "pointer to debug info: %p\n",&debug_info); 510 + 511 + return 0; 512 + } 513 + 514 + static void cleanup(void) 515 + { 516 + debug_unregister (debug_info); 517 + } 518 + 519 + module_init(init); 520 + module_exit(cleanup); 521 + 522 + Debugfs Interface 523 + ----------------- 524 + Views to the debug logs can be investigated through reading the corresponding 525 + debugfs-files: 526 + 527 + Example:: 528 + 529 + > ls /sys/kernel/debug/s390dbf/dasd 530 + flush hex_ascii level pages raw 531 + > cat /sys/kernel/debug/s390dbf/dasd/hex_ascii | sort -k2,2 -s 532 + 00 00974733272:680099 2 - 02 0006ad7e 07 ea 4a 90 | .... 533 + 00 00974733272:682210 2 - 02 0006ade6 46 52 45 45 | FREE 534 + 00 00974733272:682213 2 - 02 0006adf6 07 ea 4a 90 | .... 535 + 00 00974733272:682281 1 * 02 0006ab08 41 4c 4c 43 | EXCP 536 + 01 00974733272:682284 2 - 02 0006ab16 45 43 4b 44 | ECKD 537 + 01 00974733272:682287 2 - 02 0006ab28 00 00 00 04 | .... 538 + 01 00974733272:682289 2 - 02 0006ab3e 00 00 00 20 | ... 539 + 01 00974733272:682297 2 - 02 0006ad7e 07 ea 4a 90 | .... 540 + 01 00974733272:684384 2 - 00 0006ade6 46 52 45 45 | FREE 541 + 01 00974733272:684388 2 - 00 0006adf6 07 ea 4a 90 | .... 542 + 543 + See section about predefined views for explanation of the above output! 544 + 545 + Changing the debug level 546 + ------------------------ 547 + 548 + Example:: 549 + 550 + 551 + > cat /sys/kernel/debug/s390dbf/dasd/level 552 + 3 553 + > echo "5" > /sys/kernel/debug/s390dbf/dasd/level 554 + > cat /sys/kernel/debug/s390dbf/dasd/level 555 + 5 556 + 557 + Flushing debug areas 558 + -------------------- 559 + Debug areas can be flushed with piping the number of the desired 560 + area (0...n) to the debugfs file "flush". When using "-" all debug areas 561 + are flushed. 562 + 563 + Examples: 564 + 565 + 1. Flush debug area 0:: 566 + 567 + > echo "0" > /sys/kernel/debug/s390dbf/dasd/flush 568 + 569 + 2. Flush all debug areas:: 570 + 571 + > echo "-" > /sys/kernel/debug/s390dbf/dasd/flush 572 + 573 + Changing the size of debug areas 574 + ------------------------------------ 575 + It is possible the change the size of debug areas through piping 576 + the number of pages to the debugfs file "pages". The resize request will 577 + also flush the debug areas. 578 + 579 + Example: 580 + 581 + Define 4 pages for the debug areas of debug feature "dasd":: 582 + 583 + > echo "4" > /sys/kernel/debug/s390dbf/dasd/pages 584 + 585 + Stooping the debug feature 586 + -------------------------- 587 + Example: 588 + 589 + 1. Check if stopping is allowed:: 590 + 591 + > cat /proc/sys/s390dbf/debug_stoppable 592 + 593 + 2. Stop debug feature:: 594 + 595 + > echo 0 > /proc/sys/s390dbf/debug_active 596 + 597 + lcrash Interface 598 + ---------------- 599 + It is planned that the dump analysis tool lcrash gets an additional command 600 + 's390dbf' to display all the debug logs. With this tool it will be possible 601 + to investigate the debug logs on a live system and with a memory dump after 602 + a system crash. 603 + 604 + Investigating raw memory 605 + ------------------------ 606 + One last possibility to investigate the debug logs at a live 607 + system and after a system crash is to look at the raw memory 608 + under VM or at the Service Element. 609 + It is possible to find the anker of the debug-logs through 610 + the 'debug_area_first' symbol in the System map. Then one has 611 + to follow the correct pointers of the data-structures defined 612 + in debug.h and find the debug-areas in memory. 613 + Normally modules which use the debug feature will also have 614 + a global variable with the pointer to the debug-logs. Following 615 + this pointer it will also be possible to find the debug logs in 616 + memory. 617 + 618 + For this method it is recommended to use '16 * x + 4' byte (x = 0..n) 619 + for the length of the data field in debug_register() in 620 + order to see the debug entries well formatted. 621 + 622 + 623 + Predefined Views 624 + ---------------- 625 + 626 + There are three predefined views: hex_ascii, raw and sprintf. 627 + The hex_ascii view shows the data field in hex and ascii representation 628 + (e.g. '45 43 4b 44 | ECKD'). 629 + The raw view returns a bytestream as the debug areas are stored in memory. 630 + 631 + The sprintf view formats the debug entries in the same way as the sprintf 632 + function would do. The sprintf event/exception functions write to the 633 + debug entry a pointer to the format string (size = sizeof(long)) 634 + and for each vararg a long value. So e.g. for a debug entry with a format 635 + string plus two varargs one would need to allocate a (3 * sizeof(long)) 636 + byte data area in the debug_register() function. 637 + 638 + IMPORTANT: 639 + Using "%s" in sprintf event functions is dangerous. You can only 640 + use "%s" in the sprintf event functions, if the memory for the passed string 641 + is available as long as the debug feature exists. The reason behind this is 642 + that due to performance considerations only a pointer to the string is stored 643 + in the debug feature. If you log a string that is freed afterwards, you will 644 + get an OOPS when inspecting the debug feature, because then the debug feature 645 + will access the already freed memory. 646 + 647 + NOTE: 648 + If using the sprintf view do NOT use other event/exception functions 649 + than the sprintf-event and -exception functions. 650 + 651 + The format of the hex_ascii and sprintf view is as follows: 652 + 653 + - Number of area 654 + - Timestamp (formatted as seconds and microseconds since 00:00:00 Coordinated 655 + Universal Time (UTC), January 1, 1970) 656 + - level of debug entry 657 + - Exception flag (* = Exception) 658 + - Cpu-Number of calling task 659 + - Return Address to caller 660 + - data field 661 + 662 + The format of the raw view is: 663 + 664 + - Header as described in debug.h 665 + - datafield 666 + 667 + A typical line of the hex_ascii view will look like the following (first line 668 + is only for explanation and will not be displayed when 'cating' the view): 669 + 670 + area time level exception cpu caller data (hex + ascii) 671 + -------------------------------------------------------------------------- 672 + 00 00964419409:440690 1 - 00 88023fe 673 + 674 + 675 + Defining views 676 + -------------- 677 + 678 + Views are specified with the 'debug_view' structure. There are defined 679 + callback functions which are used for reading and writing the debugfs files:: 680 + 681 + struct debug_view { 682 + char name[DEBUG_MAX_PROCF_LEN]; 683 + debug_prolog_proc_t* prolog_proc; 684 + debug_header_proc_t* header_proc; 685 + debug_format_proc_t* format_proc; 686 + debug_input_proc_t* input_proc; 687 + void* private_data; 688 + }; 689 + 690 + where:: 691 + 692 + typedef int (debug_header_proc_t) (debug_info_t* id, 693 + struct debug_view* view, 694 + int area, 695 + debug_entry_t* entry, 696 + char* out_buf); 697 + 698 + typedef int (debug_format_proc_t) (debug_info_t* id, 699 + struct debug_view* view, char* out_buf, 700 + const char* in_buf); 701 + typedef int (debug_prolog_proc_t) (debug_info_t* id, 702 + struct debug_view* view, 703 + char* out_buf); 704 + typedef int (debug_input_proc_t) (debug_info_t* id, 705 + struct debug_view* view, 706 + struct file* file, const char* user_buf, 707 + size_t in_buf_size, loff_t* offset); 708 + 709 + 710 + The "private_data" member can be used as pointer to view specific data. 711 + It is not used by the debug feature itself. 712 + 713 + The output when reading a debugfs file is structured like this:: 714 + 715 + "prolog_proc output" 716 + 717 + "header_proc output 1" "format_proc output 1" 718 + "header_proc output 2" "format_proc output 2" 719 + "header_proc output 3" "format_proc output 3" 720 + ... 721 + 722 + When a view is read from the debugfs, the Debug Feature calls the 723 + 'prolog_proc' once for writing the prolog. 724 + Then 'header_proc' and 'format_proc' are called for each 725 + existing debug entry. 726 + 727 + The input_proc can be used to implement functionality when it is written to 728 + the view (e.g. like with 'echo "0" > /sys/kernel/debug/s390dbf/dasd/level). 729 + 730 + For header_proc there can be used the default function 731 + debug_dflt_header_fn() which is defined in debug.h. 732 + and which produces the same header output as the predefined views. 733 + E.g:: 734 + 735 + 00 00964419409:440761 2 - 00 88023ec 736 + 737 + In order to see how to use the callback functions check the implementation 738 + of the default views! 739 + 740 + Example:: 741 + 742 + #include <asm/debug.h> 743 + 744 + #define UNKNOWNSTR "data: %08x" 745 + 746 + const char* messages[] = 747 + {"This error...........\n", 748 + "That error...........\n", 749 + "Problem..............\n", 750 + "Something went wrong.\n", 751 + "Everything ok........\n", 752 + NULL 753 + }; 754 + 755 + static int debug_test_format_fn( 756 + debug_info_t * id, struct debug_view *view, 757 + char *out_buf, const char *in_buf 758 + ) 759 + { 760 + int i, rc = 0; 761 + 762 + if(id->buf_size >= 4) { 763 + int msg_nr = *((int*)in_buf); 764 + if(msg_nr < sizeof(messages)/sizeof(char*) - 1) 765 + rc += sprintf(out_buf, "%s", messages[msg_nr]); 766 + else 767 + rc += sprintf(out_buf, UNKNOWNSTR, msg_nr); 768 + } 769 + out: 770 + return rc; 771 + } 772 + 773 + struct debug_view debug_test_view = { 774 + "myview", /* name of view */ 775 + NULL, /* no prolog */ 776 + &debug_dflt_header_fn, /* default header for each entry */ 777 + &debug_test_format_fn, /* our own format function */ 778 + NULL, /* no input function */ 779 + NULL /* no private data */ 780 + }; 781 + 782 + test: 783 + ===== 784 + 785 + :: 786 + 787 + debug_info_t *debug_info; 788 + ... 789 + debug_info = debug_register ("test", 0, 4, 4 )); 790 + debug_register_view(debug_info, &debug_test_view); 791 + for(i = 0; i < 10; i ++) debug_int_event(debug_info, 1, i); 792 + 793 + > cat /sys/kernel/debug/s390dbf/test/myview 794 + 00 00964419734:611402 1 - 00 88042ca This error........... 795 + 00 00964419734:611405 1 - 00 88042ca That error........... 796 + 00 00964419734:611408 1 - 00 88042ca Problem.............. 797 + 00 00964419734:611411 1 - 00 88042ca Something went wrong. 798 + 00 00964419734:611414 1 - 00 88042ca Everything ok........ 799 + 00 00964419734:611417 1 - 00 88042ca data: 00000005 800 + 00 00964419734:611419 1 - 00 88042ca data: 00000006 801 + 00 00964419734:611422 1 - 00 88042ca data: 00000007 802 + 00 00964419734:611425 1 - 00 88042ca data: 00000008 803 + 00 00964419734:611428 1 - 00 88042ca data: 00000009
-667
Documentation/s390/s390dbf.txt
··· 1 - S390 Debug Feature 2 - ================== 3 - 4 - files: arch/s390/kernel/debug.c 5 - arch/s390/include/asm/debug.h 6 - 7 - Description: 8 - ------------ 9 - The goal of this feature is to provide a kernel debug logging API 10 - where log records can be stored efficiently in memory, where each component 11 - (e.g. device drivers) can have one separate debug log. 12 - One purpose of this is to inspect the debug logs after a production system crash 13 - in order to analyze the reason for the crash. 14 - If the system still runs but only a subcomponent which uses dbf fails, 15 - it is possible to look at the debug logs on a live system via the Linux 16 - debugfs filesystem. 17 - The debug feature may also very useful for kernel and driver development. 18 - 19 - Design: 20 - ------- 21 - Kernel components (e.g. device drivers) can register themselves at the debug 22 - feature with the function call debug_register(). This function initializes a 23 - debug log for the caller. For each debug log exists a number of debug areas 24 - where exactly one is active at one time. Each debug area consists of contiguous 25 - pages in memory. In the debug areas there are stored debug entries (log records) 26 - which are written by event- and exception-calls. 27 - 28 - An event-call writes the specified debug entry to the active debug 29 - area and updates the log pointer for the active area. If the end 30 - of the active debug area is reached, a wrap around is done (ring buffer) 31 - and the next debug entry will be written at the beginning of the active 32 - debug area. 33 - 34 - An exception-call writes the specified debug entry to the log and 35 - switches to the next debug area. This is done in order to be sure 36 - that the records which describe the origin of the exception are not 37 - overwritten when a wrap around for the current area occurs. 38 - 39 - The debug areas themselves are also ordered in form of a ring buffer. 40 - When an exception is thrown in the last debug area, the following debug 41 - entries are then written again in the very first area. 42 - 43 - There are three versions for the event- and exception-calls: One for 44 - logging raw data, one for text and one for numbers. 45 - 46 - Each debug entry contains the following data: 47 - 48 - - Timestamp 49 - - Cpu-Number of calling task 50 - - Level of debug entry (0...6) 51 - - Return Address to caller 52 - - Flag, if entry is an exception or not 53 - 54 - The debug logs can be inspected in a live system through entries in 55 - the debugfs-filesystem. Under the toplevel directory "s390dbf" there is 56 - a directory for each registered component, which is named like the 57 - corresponding component. The debugfs normally should be mounted to 58 - /sys/kernel/debug therefore the debug feature can be accessed under 59 - /sys/kernel/debug/s390dbf. 60 - 61 - The content of the directories are files which represent different views 62 - to the debug log. Each component can decide which views should be 63 - used through registering them with the function debug_register_view(). 64 - Predefined views for hex/ascii, sprintf and raw binary data are provided. 65 - It is also possible to define other views. The content of 66 - a view can be inspected simply by reading the corresponding debugfs file. 67 - 68 - All debug logs have an actual debug level (range from 0 to 6). 69 - The default level is 3. Event and Exception functions have a 'level' 70 - parameter. Only debug entries with a level that is lower or equal 71 - than the actual level are written to the log. This means, when 72 - writing events, high priority log entries should have a low level 73 - value whereas low priority entries should have a high one. 74 - The actual debug level can be changed with the help of the debugfs-filesystem 75 - through writing a number string "x" to the 'level' debugfs file which is 76 - provided for every debug log. Debugging can be switched off completely 77 - by using "-" on the 'level' debugfs file. 78 - 79 - Example: 80 - 81 - > echo "-" > /sys/kernel/debug/s390dbf/dasd/level 82 - 83 - It is also possible to deactivate the debug feature globally for every 84 - debug log. You can change the behavior using 2 sysctl parameters in 85 - /proc/sys/s390dbf: 86 - There are currently 2 possible triggers, which stop the debug feature 87 - globally. The first possibility is to use the "debug_active" sysctl. If 88 - set to 1 the debug feature is running. If "debug_active" is set to 0 the 89 - debug feature is turned off. 90 - The second trigger which stops the debug feature is a kernel oops. 91 - That prevents the debug feature from overwriting debug information that 92 - happened before the oops. After an oops you can reactivate the debug feature 93 - by piping 1 to /proc/sys/s390dbf/debug_active. Nevertheless, its not 94 - suggested to use an oopsed kernel in a production environment. 95 - If you want to disallow the deactivation of the debug feature, you can use 96 - the "debug_stoppable" sysctl. If you set "debug_stoppable" to 0 the debug 97 - feature cannot be stopped. If the debug feature is already stopped, it 98 - will stay deactivated. 99 - 100 - Kernel Interfaces: 101 - ------------------ 102 - 103 - ---------------------------------------------------------------------------- 104 - debug_info_t *debug_register(char *name, int pages, int nr_areas, 105 - int buf_size); 106 - 107 - Parameter: name: Name of debug log (e.g. used for debugfs entry) 108 - pages: number of pages, which will be allocated per area 109 - nr_areas: number of debug areas 110 - buf_size: size of data area in each debug entry 111 - 112 - Return Value: Handle for generated debug area 113 - NULL if register failed 114 - 115 - Description: Allocates memory for a debug log 116 - Must not be called within an interrupt handler 117 - 118 - ---------------------------------------------------------------------------- 119 - debug_info_t *debug_register_mode(char *name, int pages, int nr_areas, 120 - int buf_size, mode_t mode, uid_t uid, 121 - gid_t gid); 122 - 123 - Parameter: name: Name of debug log (e.g. used for debugfs entry) 124 - pages: Number of pages, which will be allocated per area 125 - nr_areas: Number of debug areas 126 - buf_size: Size of data area in each debug entry 127 - mode: File mode for debugfs files. E.g. S_IRWXUGO 128 - uid: User ID for debugfs files. Currently only 0 is 129 - supported. 130 - gid: Group ID for debugfs files. Currently only 0 is 131 - supported. 132 - 133 - Return Value: Handle for generated debug area 134 - NULL if register failed 135 - 136 - Description: Allocates memory for a debug log 137 - Must not be called within an interrupt handler 138 - 139 - --------------------------------------------------------------------------- 140 - void debug_unregister (debug_info_t * id); 141 - 142 - Parameter: id: handle for debug log 143 - 144 - Return Value: none 145 - 146 - Description: frees memory for a debug log and removes all registered debug 147 - views. 148 - Must not be called within an interrupt handler 149 - 150 - --------------------------------------------------------------------------- 151 - void debug_set_level (debug_info_t * id, int new_level); 152 - 153 - Parameter: id: handle for debug log 154 - new_level: new debug level 155 - 156 - Return Value: none 157 - 158 - Description: Sets new actual debug level if new_level is valid. 159 - 160 - --------------------------------------------------------------------------- 161 - bool debug_level_enabled (debug_info_t * id, int level); 162 - 163 - Parameter: id: handle for debug log 164 - level: debug level 165 - 166 - Return Value: True if level is less or equal to the current debug level. 167 - 168 - Description: Returns true if debug events for the specified level would be 169 - logged. Otherwise returns false. 170 - --------------------------------------------------------------------------- 171 - void debug_stop_all(void); 172 - 173 - Parameter: none 174 - 175 - Return Value: none 176 - 177 - Description: stops the debug feature if stopping is allowed. Currently 178 - used in case of a kernel oops. 179 - 180 - --------------------------------------------------------------------------- 181 - debug_entry_t* debug_event (debug_info_t* id, int level, void* data, 182 - int length); 183 - 184 - Parameter: id: handle for debug log 185 - level: debug level 186 - data: pointer to data for debug entry 187 - length: length of data in bytes 188 - 189 - Return Value: Address of written debug entry 190 - 191 - Description: writes debug entry to active debug area (if level <= actual 192 - debug level) 193 - 194 - --------------------------------------------------------------------------- 195 - debug_entry_t* debug_int_event (debug_info_t * id, int level, 196 - unsigned int data); 197 - debug_entry_t* debug_long_event(debug_info_t * id, int level, 198 - unsigned long data); 199 - 200 - Parameter: id: handle for debug log 201 - level: debug level 202 - data: integer value for debug entry 203 - 204 - Return Value: Address of written debug entry 205 - 206 - Description: writes debug entry to active debug area (if level <= actual 207 - debug level) 208 - 209 - --------------------------------------------------------------------------- 210 - debug_entry_t* debug_text_event (debug_info_t * id, int level, 211 - const char* data); 212 - 213 - Parameter: id: handle for debug log 214 - level: debug level 215 - data: string for debug entry 216 - 217 - Return Value: Address of written debug entry 218 - 219 - Description: writes debug entry in ascii format to active debug area 220 - (if level <= actual debug level) 221 - 222 - --------------------------------------------------------------------------- 223 - debug_entry_t* debug_sprintf_event (debug_info_t * id, int level, 224 - char* string,...); 225 - 226 - Parameter: id: handle for debug log 227 - level: debug level 228 - string: format string for debug entry 229 - ...: varargs used as in sprintf() 230 - 231 - Return Value: Address of written debug entry 232 - 233 - Description: writes debug entry with format string and varargs (longs) to 234 - active debug area (if level $<=$ actual debug level). 235 - floats and long long datatypes cannot be used as varargs. 236 - 237 - --------------------------------------------------------------------------- 238 - 239 - debug_entry_t* debug_exception (debug_info_t* id, int level, void* data, 240 - int length); 241 - 242 - Parameter: id: handle for debug log 243 - level: debug level 244 - data: pointer to data for debug entry 245 - length: length of data in bytes 246 - 247 - Return Value: Address of written debug entry 248 - 249 - Description: writes debug entry to active debug area (if level <= actual 250 - debug level) and switches to next debug area 251 - 252 - --------------------------------------------------------------------------- 253 - debug_entry_t* debug_int_exception (debug_info_t * id, int level, 254 - unsigned int data); 255 - debug_entry_t* debug_long_exception(debug_info_t * id, int level, 256 - unsigned long data); 257 - 258 - Parameter: id: handle for debug log 259 - level: debug level 260 - data: integer value for debug entry 261 - 262 - Return Value: Address of written debug entry 263 - 264 - Description: writes debug entry to active debug area (if level <= actual 265 - debug level) and switches to next debug area 266 - 267 - --------------------------------------------------------------------------- 268 - debug_entry_t* debug_text_exception (debug_info_t * id, int level, 269 - const char* data); 270 - 271 - Parameter: id: handle for debug log 272 - level: debug level 273 - data: string for debug entry 274 - 275 - Return Value: Address of written debug entry 276 - 277 - Description: writes debug entry in ascii format to active debug area 278 - (if level <= actual debug level) and switches to next debug 279 - area 280 - 281 - --------------------------------------------------------------------------- 282 - debug_entry_t* debug_sprintf_exception (debug_info_t * id, int level, 283 - char* string,...); 284 - 285 - Parameter: id: handle for debug log 286 - level: debug level 287 - string: format string for debug entry 288 - ...: varargs used as in sprintf() 289 - 290 - Return Value: Address of written debug entry 291 - 292 - Description: writes debug entry with format string and varargs (longs) to 293 - active debug area (if level $<=$ actual debug level) and 294 - switches to next debug area. 295 - floats and long long datatypes cannot be used as varargs. 296 - 297 - --------------------------------------------------------------------------- 298 - 299 - int debug_register_view (debug_info_t * id, struct debug_view *view); 300 - 301 - Parameter: id: handle for debug log 302 - view: pointer to debug view struct 303 - 304 - Return Value: 0 : ok 305 - < 0: Error 306 - 307 - Description: registers new debug view and creates debugfs dir entry 308 - 309 - --------------------------------------------------------------------------- 310 - int debug_unregister_view (debug_info_t * id, struct debug_view *view); 311 - 312 - Parameter: id: handle for debug log 313 - view: pointer to debug view struct 314 - 315 - Return Value: 0 : ok 316 - < 0: Error 317 - 318 - Description: unregisters debug view and removes debugfs dir entry 319 - 320 - 321 - 322 - Predefined views: 323 - ----------------- 324 - 325 - extern struct debug_view debug_hex_ascii_view; 326 - extern struct debug_view debug_raw_view; 327 - extern struct debug_view debug_sprintf_view; 328 - 329 - Examples 330 - -------- 331 - 332 - /* 333 - * hex_ascii- + raw-view Example 334 - */ 335 - 336 - #include <linux/init.h> 337 - #include <asm/debug.h> 338 - 339 - static debug_info_t* debug_info; 340 - 341 - static int init(void) 342 - { 343 - /* register 4 debug areas with one page each and 4 byte data field */ 344 - 345 - debug_info = debug_register ("test", 1, 4, 4 ); 346 - debug_register_view(debug_info,&debug_hex_ascii_view); 347 - debug_register_view(debug_info,&debug_raw_view); 348 - 349 - debug_text_event(debug_info, 4 , "one "); 350 - debug_int_exception(debug_info, 4, 4711); 351 - debug_event(debug_info, 3, &debug_info, 4); 352 - 353 - return 0; 354 - } 355 - 356 - static void cleanup(void) 357 - { 358 - debug_unregister (debug_info); 359 - } 360 - 361 - module_init(init); 362 - module_exit(cleanup); 363 - 364 - --------------------------------------------------------------------------- 365 - 366 - /* 367 - * sprintf-view Example 368 - */ 369 - 370 - #include <linux/init.h> 371 - #include <asm/debug.h> 372 - 373 - static debug_info_t* debug_info; 374 - 375 - static int init(void) 376 - { 377 - /* register 4 debug areas with one page each and data field for */ 378 - /* format string pointer + 2 varargs (= 3 * sizeof(long)) */ 379 - 380 - debug_info = debug_register ("test", 1, 4, sizeof(long) * 3); 381 - debug_register_view(debug_info,&debug_sprintf_view); 382 - 383 - debug_sprintf_event(debug_info, 2 , "first event in %s:%i\n",__FILE__,__LINE__); 384 - debug_sprintf_exception(debug_info, 1, "pointer to debug info: %p\n",&debug_info); 385 - 386 - return 0; 387 - } 388 - 389 - static void cleanup(void) 390 - { 391 - debug_unregister (debug_info); 392 - } 393 - 394 - module_init(init); 395 - module_exit(cleanup); 396 - 397 - 398 - 399 - Debugfs Interface 400 - ---------------- 401 - Views to the debug logs can be investigated through reading the corresponding 402 - debugfs-files: 403 - 404 - Example: 405 - 406 - > ls /sys/kernel/debug/s390dbf/dasd 407 - flush hex_ascii level pages raw 408 - > cat /sys/kernel/debug/s390dbf/dasd/hex_ascii | sort -k2,2 -s 409 - 00 00974733272:680099 2 - 02 0006ad7e 07 ea 4a 90 | .... 410 - 00 00974733272:682210 2 - 02 0006ade6 46 52 45 45 | FREE 411 - 00 00974733272:682213 2 - 02 0006adf6 07 ea 4a 90 | .... 412 - 00 00974733272:682281 1 * 02 0006ab08 41 4c 4c 43 | EXCP 413 - 01 00974733272:682284 2 - 02 0006ab16 45 43 4b 44 | ECKD 414 - 01 00974733272:682287 2 - 02 0006ab28 00 00 00 04 | .... 415 - 01 00974733272:682289 2 - 02 0006ab3e 00 00 00 20 | ... 416 - 01 00974733272:682297 2 - 02 0006ad7e 07 ea 4a 90 | .... 417 - 01 00974733272:684384 2 - 00 0006ade6 46 52 45 45 | FREE 418 - 01 00974733272:684388 2 - 00 0006adf6 07 ea 4a 90 | .... 419 - 420 - See section about predefined views for explanation of the above output! 421 - 422 - Changing the debug level 423 - ------------------------ 424 - 425 - Example: 426 - 427 - 428 - > cat /sys/kernel/debug/s390dbf/dasd/level 429 - 3 430 - > echo "5" > /sys/kernel/debug/s390dbf/dasd/level 431 - > cat /sys/kernel/debug/s390dbf/dasd/level 432 - 5 433 - 434 - Flushing debug areas 435 - -------------------- 436 - Debug areas can be flushed with piping the number of the desired 437 - area (0...n) to the debugfs file "flush". When using "-" all debug areas 438 - are flushed. 439 - 440 - Examples: 441 - 442 - 1. Flush debug area 0: 443 - > echo "0" > /sys/kernel/debug/s390dbf/dasd/flush 444 - 445 - 2. Flush all debug areas: 446 - > echo "-" > /sys/kernel/debug/s390dbf/dasd/flush 447 - 448 - Changing the size of debug areas 449 - ------------------------------------ 450 - It is possible the change the size of debug areas through piping 451 - the number of pages to the debugfs file "pages". The resize request will 452 - also flush the debug areas. 453 - 454 - Example: 455 - 456 - Define 4 pages for the debug areas of debug feature "dasd": 457 - > echo "4" > /sys/kernel/debug/s390dbf/dasd/pages 458 - 459 - Stooping the debug feature 460 - -------------------------- 461 - Example: 462 - 463 - 1. Check if stopping is allowed 464 - > cat /proc/sys/s390dbf/debug_stoppable 465 - 2. Stop debug feature 466 - > echo 0 > /proc/sys/s390dbf/debug_active 467 - 468 - lcrash Interface 469 - ---------------- 470 - It is planned that the dump analysis tool lcrash gets an additional command 471 - 's390dbf' to display all the debug logs. With this tool it will be possible 472 - to investigate the debug logs on a live system and with a memory dump after 473 - a system crash. 474 - 475 - Investigating raw memory 476 - ------------------------ 477 - One last possibility to investigate the debug logs at a live 478 - system and after a system crash is to look at the raw memory 479 - under VM or at the Service Element. 480 - It is possible to find the anker of the debug-logs through 481 - the 'debug_area_first' symbol in the System map. Then one has 482 - to follow the correct pointers of the data-structures defined 483 - in debug.h and find the debug-areas in memory. 484 - Normally modules which use the debug feature will also have 485 - a global variable with the pointer to the debug-logs. Following 486 - this pointer it will also be possible to find the debug logs in 487 - memory. 488 - 489 - For this method it is recommended to use '16 * x + 4' byte (x = 0..n) 490 - for the length of the data field in debug_register() in 491 - order to see the debug entries well formatted. 492 - 493 - 494 - Predefined Views 495 - ---------------- 496 - 497 - There are three predefined views: hex_ascii, raw and sprintf. 498 - The hex_ascii view shows the data field in hex and ascii representation 499 - (e.g. '45 43 4b 44 | ECKD'). 500 - The raw view returns a bytestream as the debug areas are stored in memory. 501 - 502 - The sprintf view formats the debug entries in the same way as the sprintf 503 - function would do. The sprintf event/exception functions write to the 504 - debug entry a pointer to the format string (size = sizeof(long)) 505 - and for each vararg a long value. So e.g. for a debug entry with a format 506 - string plus two varargs one would need to allocate a (3 * sizeof(long)) 507 - byte data area in the debug_register() function. 508 - 509 - IMPORTANT: Using "%s" in sprintf event functions is dangerous. You can only 510 - use "%s" in the sprintf event functions, if the memory for the passed string is 511 - available as long as the debug feature exists. The reason behind this is that 512 - due to performance considerations only a pointer to the string is stored in 513 - the debug feature. If you log a string that is freed afterwards, you will get 514 - an OOPS when inspecting the debug feature, because then the debug feature will 515 - access the already freed memory. 516 - 517 - NOTE: If using the sprintf view do NOT use other event/exception functions 518 - than the sprintf-event and -exception functions. 519 - 520 - The format of the hex_ascii and sprintf view is as follows: 521 - - Number of area 522 - - Timestamp (formatted as seconds and microseconds since 00:00:00 Coordinated 523 - Universal Time (UTC), January 1, 1970) 524 - - level of debug entry 525 - - Exception flag (* = Exception) 526 - - Cpu-Number of calling task 527 - - Return Address to caller 528 - - data field 529 - 530 - The format of the raw view is: 531 - - Header as described in debug.h 532 - - datafield 533 - 534 - A typical line of the hex_ascii view will look like the following (first line 535 - is only for explanation and will not be displayed when 'cating' the view): 536 - 537 - area time level exception cpu caller data (hex + ascii) 538 - -------------------------------------------------------------------------- 539 - 00 00964419409:440690 1 - 00 88023fe 540 - 541 - 542 - Defining views 543 - -------------- 544 - 545 - Views are specified with the 'debug_view' structure. There are defined 546 - callback functions which are used for reading and writing the debugfs files: 547 - 548 - struct debug_view { 549 - char name[DEBUG_MAX_PROCF_LEN]; 550 - debug_prolog_proc_t* prolog_proc; 551 - debug_header_proc_t* header_proc; 552 - debug_format_proc_t* format_proc; 553 - debug_input_proc_t* input_proc; 554 - void* private_data; 555 - }; 556 - 557 - where 558 - 559 - typedef int (debug_header_proc_t) (debug_info_t* id, 560 - struct debug_view* view, 561 - int area, 562 - debug_entry_t* entry, 563 - char* out_buf); 564 - 565 - typedef int (debug_format_proc_t) (debug_info_t* id, 566 - struct debug_view* view, char* out_buf, 567 - const char* in_buf); 568 - typedef int (debug_prolog_proc_t) (debug_info_t* id, 569 - struct debug_view* view, 570 - char* out_buf); 571 - typedef int (debug_input_proc_t) (debug_info_t* id, 572 - struct debug_view* view, 573 - struct file* file, const char* user_buf, 574 - size_t in_buf_size, loff_t* offset); 575 - 576 - 577 - The "private_data" member can be used as pointer to view specific data. 578 - It is not used by the debug feature itself. 579 - 580 - The output when reading a debugfs file is structured like this: 581 - 582 - "prolog_proc output" 583 - 584 - "header_proc output 1" "format_proc output 1" 585 - "header_proc output 2" "format_proc output 2" 586 - "header_proc output 3" "format_proc output 3" 587 - ... 588 - 589 - When a view is read from the debugfs, the Debug Feature calls the 590 - 'prolog_proc' once for writing the prolog. 591 - Then 'header_proc' and 'format_proc' are called for each 592 - existing debug entry. 593 - 594 - The input_proc can be used to implement functionality when it is written to 595 - the view (e.g. like with 'echo "0" > /sys/kernel/debug/s390dbf/dasd/level). 596 - 597 - For header_proc there can be used the default function 598 - debug_dflt_header_fn() which is defined in debug.h. 599 - and which produces the same header output as the predefined views. 600 - E.g: 601 - 00 00964419409:440761 2 - 00 88023ec 602 - 603 - In order to see how to use the callback functions check the implementation 604 - of the default views! 605 - 606 - Example 607 - 608 - #include <asm/debug.h> 609 - 610 - #define UNKNOWNSTR "data: %08x" 611 - 612 - const char* messages[] = 613 - {"This error...........\n", 614 - "That error...........\n", 615 - "Problem..............\n", 616 - "Something went wrong.\n", 617 - "Everything ok........\n", 618 - NULL 619 - }; 620 - 621 - static int debug_test_format_fn( 622 - debug_info_t * id, struct debug_view *view, 623 - char *out_buf, const char *in_buf 624 - ) 625 - { 626 - int i, rc = 0; 627 - 628 - if(id->buf_size >= 4) { 629 - int msg_nr = *((int*)in_buf); 630 - if(msg_nr < sizeof(messages)/sizeof(char*) - 1) 631 - rc += sprintf(out_buf, "%s", messages[msg_nr]); 632 - else 633 - rc += sprintf(out_buf, UNKNOWNSTR, msg_nr); 634 - } 635 - out: 636 - return rc; 637 - } 638 - 639 - struct debug_view debug_test_view = { 640 - "myview", /* name of view */ 641 - NULL, /* no prolog */ 642 - &debug_dflt_header_fn, /* default header for each entry */ 643 - &debug_test_format_fn, /* our own format function */ 644 - NULL, /* no input function */ 645 - NULL /* no private data */ 646 - }; 647 - 648 - ===== 649 - test: 650 - ===== 651 - debug_info_t *debug_info; 652 - ... 653 - debug_info = debug_register ("test", 0, 4, 4 )); 654 - debug_register_view(debug_info, &debug_test_view); 655 - for(i = 0; i < 10; i ++) debug_int_event(debug_info, 1, i); 656 - 657 - > cat /sys/kernel/debug/s390dbf/test/myview 658 - 00 00964419734:611402 1 - 00 88042ca This error........... 659 - 00 00964419734:611405 1 - 00 88042ca That error........... 660 - 00 00964419734:611408 1 - 00 88042ca Problem.............. 661 - 00 00964419734:611411 1 - 00 88042ca Something went wrong. 662 - 00 00964419734:611414 1 - 00 88042ca Everything ok........ 663 - 00 00964419734:611417 1 - 00 88042ca data: 00000005 664 - 00 00964419734:611419 1 - 00 88042ca data: 00000006 665 - 00 00964419734:611422 1 - 00 88042ca data: 00000007 666 - 00 00964419734:611425 1 - 00 88042ca data: 00000008 667 - 00 00964419734:611428 1 - 00 88042ca data: 00000009
+11
Documentation/s390/text_files.rst
··· 1 + ibm 3270 changelog 2 + ------------------ 3 + 4 + .. include:: 3270.ChangeLog 5 + :literal: 6 + 7 + ibm 3270 config3270.sh 8 + ---------------------- 9 + 10 + .. literalinclude:: config3270.sh 11 + :language: shell
+258 -229
Documentation/s390/vfio-ap.txt Documentation/s390/vfio-ap.rst
··· 1 - Introduction: 1 + =============================== 2 + Adjunct Processor (AP) facility 3 + =============================== 4 + 5 + 6 + Introduction 2 7 ============ 3 8 The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised 4 9 of three AP instructions and from 1 up to 256 PCIe cryptographic adapter cards. ··· 16 11 facilities which do most of the hard work of providing direct access to AP 17 12 devices. 18 13 19 - AP Architectural Overview: 14 + AP Architectural Overview 20 15 ========================= 21 16 To facilitate the comprehension of the design, let's start with some 22 17 definitions: ··· 36 31 in the LPAR, the AP bus detects the AP adapter cards assigned to the LPAR and 37 32 creates a sysfs device for each assigned adapter. For example, if AP adapters 38 33 4 and 10 (0x0a) are assigned to the LPAR, the AP bus will create the following 39 - sysfs device entries: 34 + sysfs device entries:: 40 35 41 36 /sys/devices/ap/card04 42 37 /sys/devices/ap/card0a 43 38 44 39 Symbolic links to these devices will also be created in the AP bus devices 45 - sub-directory: 40 + sub-directory:: 46 41 47 42 /sys/bus/ap/devices/[card04] 48 43 /sys/bus/ap/devices/[card04] ··· 89 84 the cross product of the AP adapter and usage domain numbers detected when the 90 85 AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage 91 86 domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the 92 - following sysfs entries: 87 + following sysfs entries:: 93 88 94 89 /sys/devices/ap/card04/04.0006 95 90 /sys/devices/ap/card04/04.0047 ··· 97 92 /sys/devices/ap/card0a/0a.0047 98 93 99 94 The following symbolic links to these devices will be created in the AP bus 100 - devices subdirectory: 95 + devices subdirectory:: 101 96 102 97 /sys/bus/ap/devices/[04.0006] 103 98 /sys/bus/ap/devices/[04.0047] ··· 117 112 domain that is not one of the usage domains, but the modified domain 118 113 must be one of the control domains. 119 114 120 - AP and SIE: 115 + AP and SIE 121 116 ========== 122 117 Let's now take a look at how AP instructions executed on a guest are interpreted 123 118 by the hardware. ··· 158 153 159 154 The APQNs can provide secure key functionality - i.e., a private key is stored 160 155 on the adapter card for each of its domains - so each APQN must be assigned to 161 - at most one guest or to the linux host. 156 + at most one guest or to the linux host:: 162 157 163 158 Example 1: Valid configuration: 164 159 ------------------------------ ··· 186 181 This is an invalid configuration because both guests have access to 187 182 APQN (1,6). 188 183 189 - The Design: 190 - =========== 184 + The Design 185 + ========== 191 186 The design introduces three new objects: 192 187 193 188 1. AP matrix device ··· 210 205 Reserve APQNs for exclusive use of KVM guests 211 206 --------------------------------------------- 212 207 The following block diagram illustrates the mechanism by which APQNs are 213 - reserved: 208 + reserved:: 214 209 215 - +------------------+ 216 - 7 remove | | 217 - +--------------------> cex4queue driver | 218 - | | | 219 - | +------------------+ 220 - | 221 - | 222 - | +------------------+ +-----------------+ 223 - | 5 register driver | | 3 create | | 224 - | +----------------> Device core +----------> matrix device | 225 - | | | | | | 226 - | | +--------^---------+ +-----------------+ 227 - | | | 228 - | | +-------------------+ 229 - | | +-----------------------------------+ | 230 - | | | 4 register AP driver | | 2 register device 231 - | | | | | 232 - +--------+---+-v---+ +--------+-------+-+ 233 - | | | | 234 - | ap_bus +--------------------- > vfio_ap driver | 235 - | | 8 probe | | 236 - +--------^---------+ +--^--^------------+ 237 - 6 edit | | | 238 - apmask | +-----------------------------+ | 9 mdev create 239 - aqmask | | 1 modprobe | 240 - +--------+-----+---+ +----------------+-+ +------------------+ 241 - | | | |8 create | mediated | 242 - | admin | | VFIO device core |---------> matrix | 243 - | + | | | device | 244 - +------+-+---------+ +--------^---------+ +--------^---------+ 245 - | | | | 246 - | | 9 create vfio_ap-passthrough | | 247 - | +------------------------------+ | 248 - +-------------------------------------------------------------+ 249 - 10 assign adapter/domain/control domain 210 + +------------------+ 211 + 7 remove | | 212 + +--------------------> cex4queue driver | 213 + | | | 214 + | +------------------+ 215 + | 216 + | 217 + | +------------------+ +----------------+ 218 + | 5 register driver | | 3 create | | 219 + | +----------------> Device core +----------> matrix device | 220 + | | | | | | 221 + | | +--------^---------+ +----------------+ 222 + | | | 223 + | | +-------------------+ 224 + | | +-----------------------------------+ | 225 + | | | 4 register AP driver | | 2 register device 226 + | | | | | 227 + +--------+---+-v---+ +--------+-------+-+ 228 + | | | | 229 + | ap_bus +--------------------- > vfio_ap driver | 230 + | | 8 probe | | 231 + +--------^---------+ +--^--^------------+ 232 + 6 edit | | | 233 + apmask | +-----------------------------+ | 9 mdev create 234 + aqmask | | 1 modprobe | 235 + +--------+-----+---+ +----------------+-+ +----------------+ 236 + | | | |8 create | mediated | 237 + | admin | | VFIO device core |---------> matrix | 238 + | + | | | device | 239 + +------+-+---------+ +--------^---------+ +--------^-------+ 240 + | | | | 241 + | | 9 create vfio_ap-passthrough | | 242 + | +------------------------------+ | 243 + +-------------------------------------------------------------+ 244 + 10 assign adapter/domain/control domain 250 245 251 246 The process for reserving an AP queue for use by a KVM guest is: 252 247 ··· 255 250 device with the device core. This will serve as the parent device for 256 251 all mediated matrix devices used to configure an AP matrix for a guest. 257 252 3. The /sys/devices/vfio_ap/matrix device is created by the device core 258 - 4 The vfio_ap device driver will register with the AP bus for AP queue devices 253 + 4. The vfio_ap device driver will register with the AP bus for AP queue devices 259 254 of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap 260 255 driver's probe and remove callback interfaces. Devices older than CEX4 queues 261 256 are not supported to simplify the implementation by not needlessly ··· 271 266 it. 272 267 9. The administrator creates a passthrough type mediated matrix device to be 273 268 used by a guest 274 - 10 The administrator assigns the adapters, usage domains and control domains 275 - to be exclusively used by a guest. 269 + 10. The administrator assigns the adapters, usage domains and control domains 270 + to be exclusively used by a guest. 276 271 277 272 Set up the VFIO mediated device interfaces 278 273 ------------------------------------------ 279 274 The VFIO AP device driver utilizes the common interface of the VFIO mediated 280 275 device core driver to: 276 + 281 277 * Register an AP mediated bus driver to add a mediated matrix device to and 282 278 remove it from a VFIO group. 283 279 * Create and destroy a mediated matrix device ··· 286 280 * Add a mediated matrix device to and remove it from an IOMMU group 287 281 288 282 The following high-level block diagram shows the main components and interfaces 289 - of the VFIO AP mediated matrix device driver: 283 + of the VFIO AP mediated matrix device driver:: 290 284 291 - +-------------+ 292 - | | 293 - | +---------+ | mdev_register_driver() +--------------+ 294 - | | Mdev | +<-----------------------+ | 295 - | | bus | | | vfio_mdev.ko | 296 - | | driver | +----------------------->+ |<-> VFIO user 297 - | +---------+ | probe()/remove() +--------------+ APIs 298 - | | 299 - | MDEV CORE | 300 - | MODULE | 301 - | mdev.ko | 302 - | +---------+ | mdev_register_device() +--------------+ 303 - | |Physical | +<-----------------------+ | 304 - | | device | | | vfio_ap.ko |<-> matrix 305 - | |interface| +----------------------->+ | device 306 - | +---------+ | callback +--------------+ 307 - +-------------+ 285 + +-------------+ 286 + | | 287 + | +---------+ | mdev_register_driver() +--------------+ 288 + | | Mdev | +<-----------------------+ | 289 + | | bus | | | vfio_mdev.ko | 290 + | | driver | +----------------------->+ |<-> VFIO user 291 + | +---------+ | probe()/remove() +--------------+ APIs 292 + | | 293 + | MDEV CORE | 294 + | MODULE | 295 + | mdev.ko | 296 + | +---------+ | mdev_register_device() +--------------+ 297 + | |Physical | +<-----------------------+ | 298 + | | device | | | vfio_ap.ko |<-> matrix 299 + | |interface| +----------------------->+ | device 300 + | +---------+ | callback +--------------+ 301 + +-------------+ 308 302 309 303 During initialization of the vfio_ap module, the matrix device is registered 310 304 with an 'mdev_parent_ops' structure that provides the sysfs attribute ··· 312 306 matrix device. 313 307 314 308 * sysfs attribute structures: 315 - * supported_type_groups 309 + 310 + supported_type_groups 316 311 The VFIO mediated device framework supports creation of user-defined 317 312 mediated device types. These mediated device types are specified 318 313 via the 'supported_type_groups' structure when a device is registered ··· 325 318 326 319 The VFIO AP device driver will register one mediated device type for 327 320 passthrough devices: 321 + 328 322 /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough 323 + 329 324 Only the read-only attributes required by the VFIO mdev framework will 330 - be provided: 331 - ... name 332 - ... device_api 333 - ... available_instances 334 - ... device_api 335 - Where: 336 - * name: specifies the name of the mediated device type 337 - * device_api: the mediated device type's API 338 - * available_instances: the number of mediated matrix passthrough devices 339 - that can be created 340 - * device_api: specifies the VFIO API 341 - * mdev_attr_groups 325 + be provided:: 326 + 327 + ... name 328 + ... device_api 329 + ... available_instances 330 + ... device_api 331 + 332 + Where: 333 + 334 + * name: 335 + specifies the name of the mediated device type 336 + * device_api: 337 + the mediated device type's API 338 + * available_instances: 339 + the number of mediated matrix passthrough devices 340 + that can be created 341 + * device_api: 342 + specifies the VFIO API 343 + mdev_attr_groups 342 344 This attribute group identifies the user-defined sysfs attributes of the 343 345 mediated device. When a device is registered with the VFIO mediated device 344 346 framework, the sysfs attribute files identified in the 'mdev_attr_groups' 345 347 structure will be created in the mediated matrix device's directory. The 346 348 sysfs attributes for a mediated matrix device are: 347 - * assign_adapter: 348 - * unassign_adapter: 349 + 350 + assign_adapter / unassign_adapter: 349 351 Write-only attributes for assigning/unassigning an AP adapter to/from the 350 352 mediated matrix device. To assign/unassign an adapter, the APID of the 351 353 adapter is echoed to the respective attribute file. 352 - * assign_domain: 353 - * unassign_domain: 354 + assign_domain / unassign_domain: 354 355 Write-only attributes for assigning/unassigning an AP usage domain to/from 355 356 the mediated matrix device. To assign/unassign a domain, the domain 356 357 number of the the usage domain is echoed to the respective attribute 357 358 file. 358 - * matrix: 359 + matrix: 359 360 A read-only file for displaying the APQNs derived from the cross product 360 361 of the adapter and domain numbers assigned to the mediated matrix device. 361 - * assign_control_domain: 362 - * unassign_control_domain: 362 + assign_control_domain / unassign_control_domain: 363 363 Write-only attributes for assigning/unassigning an AP control domain 364 364 to/from the mediated matrix device. To assign/unassign a control domain, 365 365 the ID of the domain to be assigned/unassigned is echoed to the respective 366 366 attribute file. 367 - * control_domains: 367 + control_domains: 368 368 A read-only file for displaying the control domain numbers assigned to the 369 369 mediated matrix device. 370 370 371 371 * functions: 372 - * create: 372 + 373 + create: 373 374 allocates the ap_matrix_mdev structure used by the vfio_ap driver to: 375 + 374 376 * Store the reference to the KVM structure for the guest using the mdev 375 377 * Store the AP matrix configuration for the adapters, domains, and control 376 378 domains assigned via the corresponding sysfs attributes files 377 - * remove: 379 + 380 + remove: 378 381 deallocates the mediated matrix device's ap_matrix_mdev structure. This will 379 382 be allowed only if a running guest is not using the mdev. 380 383 381 384 * callback interfaces 382 - * open: 385 + 386 + open: 383 387 The vfio_ap driver uses this callback to register a 384 388 VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the mdev matrix 385 389 device. The open is invoked when QEMU connects the VFIO iommu group ··· 398 380 to configure the KVM guest is provided via this callback. The KVM structure, 399 381 is used to configure the guest's access to the AP matrix defined via the 400 382 mediated matrix device's sysfs attribute files. 401 - * release: 383 + release: 402 384 unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the 403 385 mdev matrix device and deconfigures the guest's AP matrix. 404 386 405 - Configure the APM, AQM and ADM in the CRYCB: 387 + Configure the APM, AQM and ADM in the CRYCB 406 388 ------------------------------------------- 407 389 Configuring the AP matrix for a KVM guest will be performed when the 408 390 VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier 409 391 function is called when QEMU connects to KVM. The guest's AP matrix is 410 392 configured via it's CRYCB by: 393 + 411 394 * Setting the bits in the APM corresponding to the APIDs assigned to the 412 395 mediated matrix device via its 'assign_adapter' interface. 413 396 * Setting the bits in the AQM corresponding to the domains assigned to the ··· 437 418 438 419 Note: If the user chooses to specify a CPU model different than the 'host' 439 420 model to QEMU, the CPU model features and facilities need to be turned on 440 - explicitly; for example: 421 + explicitly; for example:: 441 422 442 423 /usr/bin/qemu-system-s390x ... -cpu z13,ap=on,apqci=on,apft=on 443 424 444 425 A guest can be precluded from using AP features/facilities by turning them off 445 - explicitly; for example: 426 + explicitly; for example:: 446 427 447 428 /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off 448 429 ··· 454 435 drivers will fail since only type 10 and newer devices can be configured for 455 436 guest use. 456 437 457 - Example: 438 + Example 458 439 ======= 459 440 Let's now provide an example to illustrate how KVM guests may be given 460 441 access to AP facilities. For this example, we will show how to configure ··· 463 444 464 445 Guest1 465 446 ------ 447 + =========== ===== ============ 466 448 CARD.DOMAIN TYPE MODE 467 - ------------------------------ 449 + =========== ===== ============ 468 450 05 CEX5C CCA-Coproc 469 451 05.0004 CEX5C CCA-Coproc 470 452 05.00ab CEX5C CCA-Coproc 471 453 06 CEX5A Accelerator 472 454 06.0004 CEX5A Accelerator 473 455 06.00ab CEX5C CCA-Coproc 456 + =========== ===== ============ 474 457 475 458 Guest2 476 459 ------ 460 + =========== ===== ============ 477 461 CARD.DOMAIN TYPE MODE 478 - ------------------------------ 462 + =========== ===== ============ 479 463 05 CEX5A Accelerator 480 464 05.0047 CEX5A Accelerator 481 465 05.00ff CEX5A Accelerator 466 + =========== ===== ============ 482 467 483 468 Guest2 484 469 ------ 470 + =========== ===== ============ 485 471 CARD.DOMAIN TYPE MODE 486 - ------------------------------ 472 + =========== ===== ============ 487 473 06 CEX5A Accelerator 488 474 06.0047 CEX5A Accelerator 489 475 06.00ff CEX5A Accelerator 476 + =========== ===== ============ 490 477 491 478 These are the steps: 492 479 ··· 517 492 * VFIO_MDEV_DEVICE 518 493 * KVM 519 494 520 - If using make menuconfig select the following to build the vfio_ap module: 521 - -> Device Drivers 522 - -> IOMMU Hardware Support 523 - select S390 AP IOMMU Support 524 - -> VFIO Non-Privileged userspace driver framework 525 - -> Mediated device driver frramework 526 - -> VFIO driver for Mediated devices 527 - -> I/O subsystem 528 - -> VFIO support for AP devices 495 + If using make menuconfig select the following to build the vfio_ap module:: 496 + 497 + -> Device Drivers 498 + -> IOMMU Hardware Support 499 + select S390 AP IOMMU Support 500 + -> VFIO Non-Privileged userspace driver framework 501 + -> Mediated device driver frramework 502 + -> VFIO driver for Mediated devices 503 + -> I/O subsystem 504 + -> VFIO support for AP devices 529 505 530 506 2. Secure the AP queues to be used by the three guests so that the host can not 531 507 access them. To secure them, there are two sysfs files that specify 532 508 bitmasks marking a subset of the APQN range as 'usable by the default AP 533 509 queue device drivers' or 'not usable by the default device drivers' and thus 534 510 available for use by the vfio_ap device driver'. The location of the sysfs 535 - files containing the masks are: 511 + files containing the masks are:: 536 512 537 - /sys/bus/ap/apmask 538 - /sys/bus/ap/aqmask 513 + /sys/bus/ap/apmask 514 + /sys/bus/ap/aqmask 539 515 540 516 The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs 541 517 (APID). Each bit in the mask, from left to right (i.e., from most significant ··· 552 526 queue device drivers; otherwise, the APQI is usable by the vfio_ap device 553 527 driver. 554 528 555 - Take, for example, the following mask: 529 + Take, for example, the following mask:: 556 530 557 531 0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff 558 532 ··· 574 548 respective sysfs mask file in one of two formats: 575 549 576 550 * An absolute hex string starting with 0x - like "0x12345678" - sets 577 - the mask. If the given string is shorter than the mask, it is padded 578 - with 0s on the right; for example, specifying a mask value of 0x41 is 579 - the same as specifying: 551 + the mask. If the given string is shorter than the mask, it is padded 552 + with 0s on the right; for example, specifying a mask value of 0x41 is 553 + the same as specifying:: 580 554 581 - 0x4100000000000000000000000000000000000000000000000000000000000000 555 + 0x4100000000000000000000000000000000000000000000000000000000000000 582 556 583 - Keep in mind that the mask reads from left to right (i.e., most 584 - significant to least significant bit in big endian order), so the mask 585 - above identifies device numbers 1 and 7 (01000001). 557 + Keep in mind that the mask reads from left to right (i.e., most 558 + significant to least significant bit in big endian order), so the mask 559 + above identifies device numbers 1 and 7 (01000001). 586 560 587 - If the string is longer than the mask, the operation is terminated with 588 - an error (EINVAL). 561 + If the string is longer than the mask, the operation is terminated with 562 + an error (EINVAL). 589 563 590 564 * Individual bits in the mask can be switched on and off by specifying 591 - each bit number to be switched in a comma separated list. Each bit 592 - number string must be prepended with a ('+') or minus ('-') to indicate 593 - the corresponding bit is to be switched on ('+') or off ('-'). Some 594 - valid values are: 565 + each bit number to be switched in a comma separated list. Each bit 566 + number string must be prepended with a ('+') or minus ('-') to indicate 567 + the corresponding bit is to be switched on ('+') or off ('-'). Some 568 + valid values are: 595 569 596 - "+0" switches bit 0 on 597 - "-13" switches bit 13 off 598 - "+0x41" switches bit 65 on 599 - "-0xff" switches bit 255 off 570 + - "+0" switches bit 0 on 571 + - "-13" switches bit 13 off 572 + - "+0x41" switches bit 65 on 573 + - "-0xff" switches bit 255 off 600 574 601 - The following example: 602 - +0,-6,+0x47,-0xf0 575 + The following example: 603 576 604 - Switches bits 0 and 71 (0x47) on 605 - Switches bits 6 and 240 (0xf0) off 577 + +0,-6,+0x47,-0xf0 606 578 607 - Note that the bits not specified in the list remain as they were before 608 - the operation. 579 + Switches bits 0 and 71 (0x47) on 580 + 581 + Switches bits 6 and 240 (0xf0) off 582 + 583 + Note that the bits not specified in the list remain as they were before 584 + the operation. 609 585 610 586 2. The masks can also be changed at boot time via parameters on the kernel 611 587 command line like this: 612 588 613 - ap.apmask=0xffff ap.aqmask=0x40 589 + ap.apmask=0xffff ap.aqmask=0x40 614 590 615 - This would create the following masks: 591 + This would create the following masks:: 616 592 617 - apmask: 618 - 0xffff000000000000000000000000000000000000000000000000000000000000 593 + apmask: 594 + 0xffff000000000000000000000000000000000000000000000000000000000000 619 595 620 - aqmask: 621 - 0x4000000000000000000000000000000000000000000000000000000000000000 596 + aqmask: 597 + 0x4000000000000000000000000000000000000000000000000000000000000000 622 598 623 - Resulting in these two pools: 599 + Resulting in these two pools:: 624 600 625 - default drivers pool: adapter 0-15, domain 1 626 - alternate drivers pool: adapter 16-255, domains 0, 2-255 601 + default drivers pool: adapter 0-15, domain 1 602 + alternate drivers pool: adapter 16-255, domains 0, 2-255 627 603 628 - Securing the APQNs for our example: 629 - ---------------------------------- 604 + Securing the APQNs for our example 605 + ---------------------------------- 630 606 To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047, 631 607 06.00ab, and 06.00ff for use by the vfio_ap device driver, the corresponding 632 - APQNs can either be removed from the default masks: 608 + APQNs can either be removed from the default masks:: 633 609 634 610 echo -5,-6 > /sys/bus/ap/apmask 635 611 636 612 echo -4,-0x47,-0xab,-0xff > /sys/bus/ap/aqmask 637 613 638 - Or the masks can be set as follows: 614 + Or the masks can be set as follows:: 639 615 640 616 echo 0xf9ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff \ 641 617 > apmask ··· 648 620 This will result in AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 649 621 06.0047, 06.00ab, and 06.00ff getting bound to the vfio_ap device driver. The 650 622 sysfs directory for the vfio_ap device driver will now contain symbolic links 651 - to the AP queue devices bound to it: 623 + to the AP queue devices bound to it:: 652 624 653 - /sys/bus/ap 654 - ... [drivers] 655 - ...... [vfio_ap] 656 - ......... [05.0004] 657 - ......... [05.0047] 658 - ......... [05.00ab] 659 - ......... [05.00ff] 660 - ......... [06.0004] 661 - ......... [06.0047] 662 - ......... [06.00ab] 663 - ......... [06.00ff] 625 + /sys/bus/ap 626 + ... [drivers] 627 + ...... [vfio_ap] 628 + ......... [05.0004] 629 + ......... [05.0047] 630 + ......... [05.00ab] 631 + ......... [05.00ff] 632 + ......... [06.0004] 633 + ......... [06.0047] 634 + ......... [06.00ab] 635 + ......... [06.00ff] 664 636 665 637 Keep in mind that only type 10 and newer adapters (i.e., CEX4 and later) 666 638 can be bound to the vfio_ap device driver. The reason for this is to ··· 673 645 queue device can be read from the parent card's sysfs directory. For example, 674 646 to see the hardware type of the queue 05.0004: 675 647 676 - cat /sys/bus/ap/devices/card05/hwtype 648 + cat /sys/bus/ap/devices/card05/hwtype 677 649 678 650 The hwtype must be 10 or higher (CEX4 or newer) in order to be bound to the 679 651 vfio_ap device driver. 680 652 681 653 3. Create the mediated devices needed to configure the AP matrixes for the 682 654 three guests and to provide an interface to the vfio_ap driver for 683 - use by the guests: 655 + use by the guests:: 684 656 685 - /sys/devices/vfio_ap/matrix/ 686 - --- [mdev_supported_types] 687 - ------ [vfio_ap-passthrough] (passthrough mediated matrix device type) 688 - --------- create 689 - --------- [devices] 657 + /sys/devices/vfio_ap/matrix/ 658 + --- [mdev_supported_types] 659 + ------ [vfio_ap-passthrough] (passthrough mediated matrix device type) 660 + --------- create 661 + --------- [devices] 690 662 691 - To create the mediated devices for the three guests: 663 + To create the mediated devices for the three guests:: 692 664 693 665 uuidgen > create 694 666 uuidgen > create 695 667 uuidgen > create 696 668 697 - or 669 + or 698 670 699 - echo $uuid1 > create 700 - echo $uuid2 > create 701 - echo $uuid3 > create 671 + echo $uuid1 > create 672 + echo $uuid2 > create 673 + echo $uuid3 > create 702 674 703 675 This will create three mediated devices in the [devices] subdirectory named 704 676 after the UUID written to the create attribute file. We call them $uuid1, 705 - $uuid2 and $uuid3 and this is the sysfs directory structure after creation: 677 + $uuid2 and $uuid3 and this is the sysfs directory structure after creation:: 706 678 707 - /sys/devices/vfio_ap/matrix/ 708 - --- [mdev_supported_types] 709 - ------ [vfio_ap-passthrough] 710 - --------- [devices] 711 - ------------ [$uuid1] 712 - --------------- assign_adapter 713 - --------------- assign_control_domain 714 - --------------- assign_domain 715 - --------------- matrix 716 - --------------- unassign_adapter 717 - --------------- unassign_control_domain 718 - --------------- unassign_domain 679 + /sys/devices/vfio_ap/matrix/ 680 + --- [mdev_supported_types] 681 + ------ [vfio_ap-passthrough] 682 + --------- [devices] 683 + ------------ [$uuid1] 684 + --------------- assign_adapter 685 + --------------- assign_control_domain 686 + --------------- assign_domain 687 + --------------- matrix 688 + --------------- unassign_adapter 689 + --------------- unassign_control_domain 690 + --------------- unassign_domain 719 691 720 - ------------ [$uuid2] 721 - --------------- assign_adapter 722 - --------------- assign_control_domain 723 - --------------- assign_domain 724 - --------------- matrix 725 - --------------- unassign_adapter 726 - ----------------unassign_control_domain 727 - ----------------unassign_domain 692 + ------------ [$uuid2] 693 + --------------- assign_adapter 694 + --------------- assign_control_domain 695 + --------------- assign_domain 696 + --------------- matrix 697 + --------------- unassign_adapter 698 + ----------------unassign_control_domain 699 + ----------------unassign_domain 728 700 729 - ------------ [$uuid3] 730 - --------------- assign_adapter 731 - --------------- assign_control_domain 732 - --------------- assign_domain 733 - --------------- matrix 734 - --------------- unassign_adapter 735 - ----------------unassign_control_domain 736 - ----------------unassign_domain 701 + ------------ [$uuid3] 702 + --------------- assign_adapter 703 + --------------- assign_control_domain 704 + --------------- assign_domain 705 + --------------- matrix 706 + --------------- unassign_adapter 707 + ----------------unassign_control_domain 708 + ----------------unassign_domain 737 709 738 710 4. The administrator now needs to configure the matrixes for the mediated 739 711 devices $uuid1 (for Guest1), $uuid2 (for Guest2) and $uuid3 (for Guest3). 740 712 741 - This is how the matrix is configured for Guest1: 713 + This is how the matrix is configured for Guest1:: 742 714 743 715 echo 5 > assign_adapter 744 716 echo 6 > assign_adapter 745 717 echo 4 > assign_domain 746 718 echo 0xab > assign_domain 747 719 748 - Control domains can similarly be assigned using the assign_control_domain 749 - sysfs file. 720 + Control domains can similarly be assigned using the assign_control_domain 721 + sysfs file. 750 722 751 - If a mistake is made configuring an adapter, domain or control domain, 752 - you can use the unassign_xxx files to unassign the adapter, domain or 753 - control domain. 723 + If a mistake is made configuring an adapter, domain or control domain, 724 + you can use the unassign_xxx files to unassign the adapter, domain or 725 + control domain. 754 726 755 - To display the matrix configuration for Guest1: 727 + To display the matrix configuration for Guest1:: 756 728 757 - cat matrix 729 + cat matrix 758 730 759 - This is how the matrix is configured for Guest2: 731 + This is how the matrix is configured for Guest2:: 760 732 761 733 echo 5 > assign_adapter 762 734 echo 0x47 > assign_domain 763 735 echo 0xff > assign_domain 764 736 765 - This is how the matrix is configured for Guest3: 737 + This is how the matrix is configured for Guest3:: 766 738 767 739 echo 6 > assign_adapter 768 740 echo 0x47 > assign_domain ··· 811 783 configured for the system. If a control domain number higher than the maximum 812 784 is specified, the operation will terminate with an error (ENODEV). 813 785 814 - 5. Start Guest1: 786 + 5. Start Guest1:: 815 787 816 - /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ 817 - -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ... 788 + /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ 789 + -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ... 818 790 819 - 7. Start Guest2: 791 + 7. Start Guest2:: 820 792 821 - /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ 822 - -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ... 793 + /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ 794 + -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ... 823 795 824 - 7. Start Guest3: 796 + 7. Start Guest3:: 825 797 826 - /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ 827 - -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ... 798 + /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ 799 + -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ... 828 800 829 801 When the guest is shut down, the mediated matrix devices may be removed. 830 802 831 - Using our example again, to remove the mediated matrix device $uuid1: 803 + Using our example again, to remove the mediated matrix device $uuid1:: 832 804 833 805 /sys/devices/vfio_ap/matrix/ 834 806 --- [mdev_supported_types] ··· 837 809 ------------ [$uuid1] 838 810 --------------- remove 839 811 812 + :: 840 813 841 814 echo 1 > remove 842 815 843 - This will remove all of the mdev matrix device's sysfs structures including 844 - the mdev device itself. To recreate and reconfigure the mdev matrix device, 845 - all of the steps starting with step 3 will have to be performed again. Note 846 - that the remove will fail if a guest using the mdev is still running. 816 + This will remove all of the mdev matrix device's sysfs structures including 817 + the mdev device itself. To recreate and reconfigure the mdev matrix device, 818 + all of the steps starting with step 3 will have to be performed again. Note 819 + that the remove will fail if a guest using the mdev is still running. 847 820 848 - It is not necessary to remove an mdev matrix device, but one may want to 849 - remove it if no guest will use it during the remaining lifetime of the linux 850 - host. If the mdev matrix device is removed, one may want to also reconfigure 851 - the pool of adapters and queues reserved for use by the default drivers. 821 + It is not necessary to remove an mdev matrix device, but one may want to 822 + remove it if no guest will use it during the remaining lifetime of the linux 823 + host. If the mdev matrix device is removed, one may want to also reconfigure 824 + the pool of adapters and queues reserved for use by the default drivers. 852 825 853 826 Limitations 854 827 ===========
+59 -33
Documentation/s390/vfio-ccw.txt Documentation/s390/vfio-ccw.rst
··· 1 + ================================== 1 2 vfio-ccw: the basic infrastructure 2 3 ================================== 3 4 ··· 12 11 Different than other hardware architectures, s390 has defined a unified 13 12 I/O access method, which is so called Channel I/O. It has its own access 14 13 patterns: 14 + 15 15 - Channel programs run asynchronously on a separate (co)processor. 16 16 - The channel subsystem will access any memory designated by the caller 17 17 in the channel program directly, i.e. there is no iommu involved. 18 + 18 19 Thus when we introduce vfio support for these devices, we realize it 19 20 with a mediated device (mdev) implementation. The vfio mdev will be 20 21 added to an iommu group, so as to make itself able to be managed by the ··· 27 24 28 25 This document does not intend to explain the s390 I/O architecture in 29 26 every detail. More information/reference could be found here: 27 + 30 28 - A good start to know Channel I/O in general: 31 29 https://en.wikipedia.org/wiki/Channel_I/O 32 30 - s390 architecture: ··· 84 80 interrupt handler in the form of interrupt response block (IRB). 85 81 86 82 Back to vfio-ccw, in short: 83 + 87 84 - ORBs and channel programs are built in guest kernel (with guest 88 85 physical addresses). 89 86 - ORBs and channel programs are passed to the host kernel. ··· 111 106 112 107 Within this implementation, we have two drivers for two types of 113 108 devices: 109 + 114 110 - The vfio_ccw driver for the physical subchannel device. 115 111 This is an I/O subchannel driver for the real subchannel device. It 116 112 realizes a group of callbacks and registers to the mdev framework as a ··· 143 137 vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu 144 138 backend for the physical devices to pin and unpin pages by demand. 145 139 146 - Below is a high Level block diagram. 140 + Below is a high Level block diagram:: 147 141 148 142 +-------------+ 149 143 | | ··· 164 158 +-------------+ 165 159 166 160 The process of how these work together. 161 + 167 162 1. vfio_ccw.ko drives the physical I/O subchannel, and registers the 168 163 physical device (with callbacks) to mdev framework. 169 164 When vfio_ccw probing the subchannel device, it registers device ··· 185 178 186 179 An I/O region is used to accept channel program request from user 187 180 space and store I/O interrupt result for user space to retrieve. The 188 - definition of the region is: 181 + definition of the region is:: 189 182 190 - struct ccw_io_region { 191 - #define ORB_AREA_SIZE 12 192 - __u8 orb_area[ORB_AREA_SIZE]; 193 - #define SCSW_AREA_SIZE 12 194 - __u8 scsw_area[SCSW_AREA_SIZE]; 195 - #define IRB_AREA_SIZE 96 196 - __u8 irb_area[IRB_AREA_SIZE]; 197 - __u32 ret_code; 198 - } __packed; 183 + struct ccw_io_region { 184 + #define ORB_AREA_SIZE 12 185 + __u8 orb_area[ORB_AREA_SIZE]; 186 + #define SCSW_AREA_SIZE 12 187 + __u8 scsw_area[SCSW_AREA_SIZE]; 188 + #define IRB_AREA_SIZE 96 189 + __u8 irb_area[IRB_AREA_SIZE]; 190 + __u32 ret_code; 191 + } __packed; 199 192 200 193 While starting an I/O request, orb_area should be filled with the 201 194 guest ORB, and scsw_area should be filled with the SCSW of the Virtual ··· 212 205 vfio-iommu-type1 as the vfio iommu backend. 213 206 214 207 * CCW translation APIs 215 - A group of APIs (start with 'cp_') to do CCW translation. The CCWs 208 + A group of APIs (start with `cp_`) to do CCW translation. The CCWs 216 209 passed in by a user space program are organized with their guest 217 210 physical memory addresses. These APIs will copy the CCWs into kernel 218 211 space, and assemble a runnable kernel channel program by updating the ··· 224 217 This driver utilizes the CCW translation APIs and introduces 225 218 vfio_ccw, which is the driver for the I/O subchannel devices you want 226 219 to pass through. 227 - vfio_ccw implements the following vfio ioctls: 220 + vfio_ccw implements the following vfio ioctls:: 221 + 228 222 VFIO_DEVICE_GET_INFO 229 223 VFIO_DEVICE_GET_IRQ_INFO 230 224 VFIO_DEVICE_GET_REGION_INFO 231 225 VFIO_DEVICE_RESET 232 226 VFIO_DEVICE_SET_IRQS 227 + 233 228 This provides an I/O region, so that the user space program can pass a 234 229 channel program to the kernel, to do further CCW translation before 235 230 issuing them to a real device. ··· 245 236 handled (without error handling). 246 237 247 238 Explanation: 248 - Q1-Q7: QEMU side process. 249 - K1-K5: Kernel side process. 250 239 251 - Q1. Get I/O region info during initialization. 252 - Q2. Setup event notifier and handler to handle I/O completion. 240 + - Q1-Q7: QEMU side process. 241 + - K1-K5: Kernel side process. 253 242 254 - ... ... 243 + Q1. 244 + Get I/O region info during initialization. 255 245 256 - Q3. Intercept a ssch instruction. 257 - Q4. Write the guest channel program and ORB to the I/O region. 258 - K1. Copy from guest to kernel. 259 - K2. Translate the guest channel program to a host kernel space 260 - channel program, which becomes runnable for a real device. 261 - K3. With the necessary information contained in the orb passed in 262 - by QEMU, issue the ccwchain to the device. 263 - K4. Return the ssch CC code. 264 - Q5. Return the CC code to the guest. 246 + Q2. 247 + Setup event notifier and handler to handle I/O completion. 265 248 266 249 ... ... 267 250 268 - K5. Interrupt handler gets the I/O result and write the result to 269 - the I/O region. 270 - K6. Signal QEMU to retrieve the result. 271 - Q6. Get the signal and event handler reads out the result from the I/O 251 + Q3. 252 + Intercept a ssch instruction. 253 + Q4. 254 + Write the guest channel program and ORB to the I/O region. 255 + 256 + K1. 257 + Copy from guest to kernel. 258 + K2. 259 + Translate the guest channel program to a host kernel space 260 + channel program, which becomes runnable for a real device. 261 + K3. 262 + With the necessary information contained in the orb passed in 263 + by QEMU, issue the ccwchain to the device. 264 + K4. 265 + Return the ssch CC code. 266 + Q5. 267 + Return the CC code to the guest. 268 + 269 + ... ... 270 + 271 + K5. 272 + Interrupt handler gets the I/O result and write the result to 273 + the I/O region. 274 + K6. 275 + Signal QEMU to retrieve the result. 276 + 277 + Q6. 278 + Get the signal and event handler reads out the result from the I/O 272 279 region. 273 - Q7. Update the irb for the guest. 280 + Q7. 281 + Update the irb for the guest. 274 282 275 283 Limitations 276 284 ----------- ··· 321 295 1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832) 322 296 2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204) 323 297 3. https://en.wikipedia.org/wiki/Channel_I/O 324 - 4. Documentation/s390/cds.txt 298 + 4. Documentation/s390/cds.rst 325 299 5. Documentation/vfio.txt 326 300 6. Documentation/vfio-mediated-device.txt
+2
Documentation/s390/zfcpdump.txt Documentation/s390/zfcpdump.rst
··· 1 + ================================== 1 2 The s390 SCSI dump tool (zfcpdump) 3 + ================================== 2 4 3 5 System z machines (z900 or higher) provide hardware support for creating system 4 6 dumps on SCSI disks. The dump process is initiated by booting a dump tool, which
+2 -2
MAINTAINERS
··· 13703 13703 L: kvm@vger.kernel.org 13704 13704 S: Supported 13705 13705 F: drivers/s390/cio/vfio_ccw* 13706 - F: Documentation/s390/vfio-ccw.txt 13706 + F: Documentation/s390/vfio-ccw.rst 13707 13707 F: include/uapi/linux/vfio_ccw.h 13708 13708 13709 13709 S390 ZCRYPT DRIVER ··· 13723 13723 F: drivers/s390/crypto/vfio_ap_drv.c 13724 13724 F: drivers/s390/crypto/vfio_ap_private.h 13725 13725 F: drivers/s390/crypto/vfio_ap_ops.c 13726 - F: Documentation/s390/vfio-ap.txt 13726 + F: Documentation/s390/vfio-ap.rst 13727 13727 13728 13728 S390 ZFCP DRIVER 13729 13729 M: Steffen Maier <maier@linux.ibm.com>
+2 -2
arch/s390/Kconfig
··· 810 810 Crash dump kernels are loaded in the main kernel with kexec-tools 811 811 into a specially reserved region and then later executed after 812 812 a crash by kdump/kexec. 813 - Refer to <file:Documentation/s390/zfcpdump.txt> for more details on this. 813 + Refer to <file:Documentation/s390/zfcpdump.rst> for more details on this. 814 814 This option also enables s390 zfcpdump. 815 - See also <file:Documentation/s390/zfcpdump.txt> 815 + See also <file:Documentation/s390/zfcpdump.rst> 816 816 817 817 endmenu 818 818
+2 -2
arch/s390/include/asm/debug.h
··· 152 152 153 153 /* 154 154 * IMPORTANT: Use "%s" in sprintf format strings with care! Only pointers are 155 - * stored in the s390dbf. See Documentation/s390/s390dbf.txt for more details! 155 + * stored in the s390dbf. See Documentation/s390/s390dbf.rst for more details! 156 156 */ 157 157 extern debug_entry_t * 158 158 __debug_sprintf_event(debug_info_t *id, int level, char *string, ...) ··· 210 210 211 211 /* 212 212 * IMPORTANT: Use "%s" in sprintf format strings with care! Only pointers are 213 - * stored in the s390dbf. See Documentation/s390/s390dbf.txt for more details! 213 + * stored in the s390dbf. See Documentation/s390/s390dbf.rst for more details! 214 214 */ 215 215 extern debug_entry_t * 216 216 __debug_sprintf_exception(debug_info_t *id, int level, char *string, ...)
+1 -1
drivers/s390/char/zcore.c
··· 4 4 * dumps on SCSI disks (zfcpdump). The "zcore/mem" debugfs file shows the same 5 5 * dump format as s390 standalone dumps. 6 6 * 7 - * For more information please refer to Documentation/s390/zfcpdump.txt 7 + * For more information please refer to Documentation/s390/zfcpdump.rst 8 8 * 9 9 * Copyright IBM Corp. 2003, 2008 10 10 * Author(s): Michael Holzheu