this repo has no description
at trunk 97 lines 4.4 kB view raw view rendered
1# Object model 2 3The Object Model is the main difference that allows Skybison to be faster than 4CPython. In CPython, every object is a heap-allocated `PyObject*` -- and to 5speed this up, they use [free lists](https://en.wikipedia.org/wiki/Free_list) 6to avoid frequent `malloc`/`free` and have some static objects for small integers. 7(You can and should read more about the [CPython Data 8Model](https://docs.python.org/3/reference/datamodel.html) when you get a 9chance). 10 11Skybison is making this more efficient by having immediate types. All managed 12objects are represented by `RawObject`, which is a piece of memory of word 13length. It can either directly include the actual data (*immediate objects*), 14or point somewhere on the heap where the data is located (*heap-allocated 15objects*). 16 17> NOTE: This document was written during early project planning stages. Parts 18> of it may be out of date. 19 20## Object tagging 21 22The least significant bits of `uword` RawObject are used as a tag to determines 23the value type. At the conceptual level, we have two stages for resolving tags: 24a 3-bit low tag and a 5-bit high tag. This separation is only conceptual; in 25the code, we check against the full 5-bit tags using prefix masks. 26 27### 3-bit Tags 28 29* `**0` -- Small integers -- all bits except the last are used for the actual 30 value 31* `001` -- Heap-allocated objects -- the value of RawObject minus 1 is the actual 32 address 33* `011` -- Header (a special internal type, see below) 34* `101` -- Small sequence types and Error -- escape into the 8-bit tag 35* `111` -- Immediate objects -- escape into the 8-bit tag 36 37### 5-bit Tags (101) 38 39* `00101` -- Small bytes -- the 3 next least significant bits are used for the 40 length -- the remaining bytes of the value are used to store the actual bytes 41 in little endian order 42* `01101` -- Small string -- the 3 next least significant bits are used for the 43 length -- the remaining bytes of the value are used to store the actual string 44 in little endian order 45* `10101` -- Error -- the 3 next least significant bits store a tag indicating 46 the kind of Error. See `ErrorKind` in `runtime/objects.h`. Remaining bits are 47 all `0`. 48 49### 5-bit Tags (111) 50 51* `00111` -- Bool -- the next least significant bit is the value, true/false -- 52 remaining bits are all `0` 53* `01111` -- NotImplemented -- remaining bits are all `0` 54* `10111` -- Unbound -- remaining bits are all `0` 55* `11111` -- None -- remaining bits are all `1` -- this is so we can 56 `memset(obj.address(), -1, sizeof(obj))` to fill a heap-allocated object with 57 `None` 58 59With this design, many frequent objects are stored as immediate values and many 60operations don't need the extra indirection. 61 62Why this tagging scheme? Having small integers tagged with an extra zero at the 63end makes it possible to use many arithmetic instructions without additional 64work. To sum up two small integers, you just add the two RawObjects and the 65result is automatically the right small integer -- you just need to check for 66overflow. For heap-allocated objects, memory address always ends with `00` 67because of default memory alignment, so we don't lose any bits by adding this 68tag. Dereferencing does not have any extra overhead because all modern CPUs can 69support dereferencing with offset. 70 71## Heap-allocated objects 72 73The first word of every heap-allocated object contains the *Header* - a special 74structure with metadata about the object. Headers are tagged in the same way as 75other objects, which makes it easier for the GC to identify the start of an 76object when scanning the heap. The metadata includes type identification 77(layout id) and object hash. 78 79The rest of the allocated memory are attributes. A few object have immediate 80attributes (`Float` has an attribute of type `double`, `LargeStr` contains a 81`byte` array), but most attributes are simply other `RawObject`s. 82 83While small integers and small strings can be stored as immediate objects, 84heap-allocated counterparts are needed for larger values. `LargeStr` is 85basically a byte array with a header; `LargeInt` is an array of digits of word 86length, representing the number in two's-complement form (again, with a 87header). Special abstract types `Int` and `Str` are needed to hide this 88implementation detail -- managed code needs to work with normal `int` and `str` 89types. 90 91## Attribute access 92 93TODO: document 94 95## Subtypes 96 97TODO: document