this repo has no description
1# Object model
2
3The Object Model is the main difference that allows Skybison to be faster than
4CPython. In CPython, every object is a heap-allocated `PyObject*` -- and to
5speed this up, they use [free lists](https://en.wikipedia.org/wiki/Free_list)
6to avoid frequent `malloc`/`free` and have some static objects for small integers.
7(You can and should read more about the [CPython Data
8Model](https://docs.python.org/3/reference/datamodel.html) when you get a
9chance).
10
11Skybison is making this more efficient by having immediate types. All managed
12objects are represented by `RawObject`, which is a piece of memory of word
13length. It can either directly include the actual data (*immediate objects*),
14or point somewhere on the heap where the data is located (*heap-allocated
15objects*).
16
17> NOTE: This document was written during early project planning stages. Parts
18> of it may be out of date.
19
20## Object tagging
21
22The least significant bits of `uword` RawObject are used as a tag to determines
23the value type. At the conceptual level, we have two stages for resolving tags:
24a 3-bit low tag and a 5-bit high tag. This separation is only conceptual; in
25the code, we check against the full 5-bit tags using prefix masks.
26
27### 3-bit Tags
28
29* `**0` -- Small integers -- all bits except the last are used for the actual
30 value
31* `001` -- Heap-allocated objects -- the value of RawObject minus 1 is the actual
32 address
33* `011` -- Header (a special internal type, see below)
34* `101` -- Small sequence types and Error -- escape into the 8-bit tag
35* `111` -- Immediate objects -- escape into the 8-bit tag
36
37### 5-bit Tags (101)
38
39* `00101` -- Small bytes -- the 3 next least significant bits are used for the
40 length -- the remaining bytes of the value are used to store the actual bytes
41 in little endian order
42* `01101` -- Small string -- the 3 next least significant bits are used for the
43 length -- the remaining bytes of the value are used to store the actual string
44 in little endian order
45* `10101` -- Error -- the 3 next least significant bits store a tag indicating
46 the kind of Error. See `ErrorKind` in `runtime/objects.h`. Remaining bits are
47 all `0`.
48
49### 5-bit Tags (111)
50
51* `00111` -- Bool -- the next least significant bit is the value, true/false --
52 remaining bits are all `0`
53* `01111` -- NotImplemented -- remaining bits are all `0`
54* `10111` -- Unbound -- remaining bits are all `0`
55* `11111` -- None -- remaining bits are all `1` -- this is so we can
56 `memset(obj.address(), -1, sizeof(obj))` to fill a heap-allocated object with
57 `None`
58
59With this design, many frequent objects are stored as immediate values and many
60operations don't need the extra indirection.
61
62Why this tagging scheme? Having small integers tagged with an extra zero at the
63end makes it possible to use many arithmetic instructions without additional
64work. To sum up two small integers, you just add the two RawObjects and the
65result is automatically the right small integer -- you just need to check for
66overflow. For heap-allocated objects, memory address always ends with `00`
67because of default memory alignment, so we don't lose any bits by adding this
68tag. Dereferencing does not have any extra overhead because all modern CPUs can
69support dereferencing with offset.
70
71## Heap-allocated objects
72
73The first word of every heap-allocated object contains the *Header* - a special
74structure with metadata about the object. Headers are tagged in the same way as
75other objects, which makes it easier for the GC to identify the start of an
76object when scanning the heap. The metadata includes type identification
77(layout id) and object hash.
78
79The rest of the allocated memory are attributes. A few object have immediate
80attributes (`Float` has an attribute of type `double`, `LargeStr` contains a
81`byte` array), but most attributes are simply other `RawObject`s.
82
83While small integers and small strings can be stored as immediate objects,
84heap-allocated counterparts are needed for larger values. `LargeStr` is
85basically a byte array with a header; `LargeInt` is an array of digits of word
86length, representing the number in two's-complement form (again, with a
87header). Special abstract types `Int` and `Str` are needed to hide this
88implementation detail -- managed code needs to work with normal `int` and `str`
89types.
90
91## Attribute access
92
93TODO: document
94
95## Subtypes
96
97TODO: document