commits
On some compilers, the warnings below get generated. I think it's
something to do with not defining `_POSIX_C_SOURCE` when we include
`time.h` but that didn't work--so it might be a project-level setting
that we need to tweak.
In the meantime, forward declare the structs in `cpython-types.h`.
```
In file included from /home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-data.h:5,
from /home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/Python.h:15,
from signature.c:1:
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:1311:58: warning: ‘struct timespec’ declared inside parameter list will not be visible outside of this definition or declaration
1311 | PyAPI_FUNC_DECL(int _PyTime_AsTimespec(_PyTime_t, struct timespec*));
| ^~~~~~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:21:70: note: in definition of macro ‘PyAPI_FUNC_DECL’
21 | #define PyAPI_FUNC_DECL(DECL) __attribute__((visibility("default"))) DECL
| ^~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:1312:57: warning: ‘struct timeval’ declared inside parameter list will not be visible outside of this definition or declaration
1312 | PyAPI_FUNC_DECL(int _PyTime_AsTimeval(_PyTime_t, struct timeval*,
| ^~~~~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:21:70: note: in definition of macro ‘PyAPI_FUNC_DECL’
21 | #define PyAPI_FUNC_DECL(DECL) __attribute__((visibility("default"))) DECL
| ^~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:1316:65: warning: ‘struct timeval’ declared inside parameter list will not be visible outside of this definition or declaration
1316 | PyAPI_FUNC_DECL(int _PyTime_AsTimeval_noraise(_PyTime_t, struct timeval*,
| ^~~~~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:21:70: note: in definition of macro ‘PyAPI_FUNC_DECL’
21 | #define PyAPI_FUNC_DECL(DECL) __attribute__((visibility("default"))) DECL
| ^~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:1325:64: warning: ‘struct timespec’ declared inside parameter list will not be visible outside of this definition or declaration
1325 | PyAPI_FUNC_DECL(int _PyTime_FromTimespec(_PyTime_t* tp, struct timespec* ts));
| ^~~~~~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:21:70: note: in definition of macro ‘PyAPI_FUNC_DECL’
21 | #define PyAPI_FUNC_DECL(DECL) __attribute__((visibility("default"))) DECL
| ^~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:1326:63: warning: ‘struct timeval’ declared inside parameter list will not be visible outside of this definition or declaration
1326 | PyAPI_FUNC_DECL(int _PyTime_FromTimeval(_PyTime_t* tp, struct timeval* tv));
| ^~~~~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:21:70: note: in definition of macro ‘PyAPI_FUNC_DECL’
21 | #define PyAPI_FUNC_DECL(DECL) __attribute__((visibility("default"))) DECL
| ^~~~
```
This signature should be more like `dict.update`. This is also needed
for setuptools.
Subclasses in the wild write `@property` functions that shadow the
attributes. Since they are getter only, the builtin attribute setting
fails with an `AttributeError`:
```
======================================================================
ERROR: test_set_path_attr_on_import_error_subclass_with_property_sets_attr (__main__.ExceptionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "library/builtins_test.py", line 4968, in test_set_path_attr_on_import_error_subclass_with_property_sets_attr
c = C("a path")
File "library/builtins.py", line 1037, in __init__
self.path = path
AttributeError: can't set attribute
```
If we use `_instance_setattr`, this works.
This may be a more general problem since CPython does all this stuff in
C and our builtins are all in Python, but I am not sure how to approach
that yet.
This speeds up things like `collections.deque()` where `collections` is
a module.
Add tests.
This reverts commit a23d3a08d5450c5c3dd07a9a9f2b1b0cb378b037.
`collections.deque` now has `__setitem__`. They were implemented in
D25245335 (fafcc34f308645af90fef67da51573b44f31f004).
Cache them as globals instead. It's faster (for now).
Looking it up off the type all the time is slow.
In the long run we should be able to just fold this at optimize time but
we can't right now.
Start by caching `range.__new__` as the ctor, but specialize in the
interpreter at each call site to a different ctor. Handle all three
cases:
* range(stop)
* range(start, stop)
* range(start, stop, step)
Bypass `_type_dunder_call` in cases where types do not define a
`__new__` and are not a metaclass. Cache the `__init__` and call it
directly.
Add tests.
This is helpful for debugging, especially when the LayoutId in question
is not built-in and is not a pretty-printed enum value in gdb.
They are tuple subclasses and this is fairly straightforward. The only
surprising thing is that they only unpack the in-sequence attributes,
not the in-object ones.
Add tests.
Don't give up entirely; we still benefit from having some.
Also, rewrite bytecode in one pass instead of two. This skips a bunch of
extra work, especially for opcodes that do not need caches.
This should have been there in the first place but I was having some
weird issue with it not being called. Whatever the issue is, it's gone
now.
This gets a lot of platform-specific optimizations and is a compiler
intrinsic. It should be able to handle overlap cases.
We may need to replace this later when we have read/write barriers, but
I think the idea of these functions is that they are supposed to be
atomic copies anyway.
* Move CPython out of Skybison build and test step
No need to do everything serially.
* Use Ninja to build instead of Make
This makes for quicker iteration on optimizing a specific benchmark.
If we can avoid it, don't fully `strFormat` a string and set an
`AttributeError` on the `Thread`. There's no need; we're immediately
going to clear it.
To signal that we should not raise an `AttributeError`, set the initial
`location_out` to `Unbound::object()` (instead of the normal call path,
which sets it to `NoneType::object()`).
If we can avoid it, don't fully `strFormat` a string and set an
`AttributeError` on the `Thread`. There's no need; we're immediately
going to clear it.
To signal that we should not raise an `AttributeError`, set the initial
`location_out` to `Unbound::object()` (instead of the normal call path,
which sets it to `NoneType::object()`).
nqueens calls `set(...)` a lot and this should speed that up.
Don't make a monomorphic cache on failure, though; just look up the
method as usual.
Without asm implementation:
```
"nqueens": {
"benchmark": "nqueens",
"cg_instructions before": 3293749025,
"cg_instructions now": 3207631962,
"cg_instructions ∆": "-2.6%",
"interpreter_name": "pyro",
"version before": "80d8286edd5f6f19db0d5177b6490b89d9133765",
"version now": "e0f0577d47ff5986029f885e5b502732238a3126"
},
```
With asm implementation:
```
"nqueens": {
"benchmark": "nqueens",
"cg_instructions before": 3293749025,
"cg_instructions now": 3199172980,
"cg_instructions ∆": "-2.9%",
"interpreter_name": "pyro",
"version before": "80d8286edd5f6f19db0d5177b6490b89d9133765",
"version now": "e49774d6da690b979a5ab645f53ab31fd1debde5"
},
```
0.3%. Not bad.
* Run quickbench with Django on pull requests
* Try using quickbench builds to run and format the other benchmarks
* Remove older benchmarking method
* Update to github-script v6
Cases like `int(a_str)` are way more common; check for those first.
The function is not supposed to take **kwargs*; it should raise a
TypeError. This also speeds up the creation of reversed() objects by not
allocating an empty dict with every call.
Add tests.
```json
{
"django_minimal_requests": {
"benchmark": "django_minimal_requests",
"cg_instructions before": 694510,
"cg_instructions now": 693934,
"cg_instructions ∆": "-0.1%",
"interpreter_args": [],
"interpreter_name": "pyro",
"version before": "cd35adcc708c74149e7d2f9ec197d3c1e46fed4a",
"version now": "f88097c5045a760ab9494a43cbbe897392376cb1"
}
}
```
This DCHECK is idealistic but the world does not yet match this
invariant. See #269 and #460 for more information, but the gist is that
we do not remove dependency links up the type hierarchy, so there are
some leftover links after eviction.
Add tests.
This was useful in tracking down an IC bug.
Add tests.
This only works if the receiver's type has not overridden `__class__`.
Fixes #454
Add tests.
This is often the long pole in the tent for startup (if you haven't
pre-compiled your code) and is worth occasionally measuring.
Fixes #455
This is great because the mold linker is at least 10x faster than
whatever the default for clang is. Makes repeated modification much
easier, especially with ccache.
Fixes #437
Local variables are implemented with `LOAD_FAST`/`STORE_FAST`/`DELETE_FAST`.
The implementation of `LOAD_FAST` looks like this:
```c++
HANDLER_INLINE Continue Interpreter::doLoadFast(Thread* thread, word arg) {
Frame* frame = thread->currentFrame();
RawObject value = frame->local(arg);
if (UNLIKELY(value.isErrorNotFound())) {
HandleScope scope(thread);
Str name(&scope, Tuple::cast(Code::cast(frame->code()).varnames()).at(arg));
thread->raiseWithFmt(LayoutId::kUnboundLocalError,
"local variable '%S' referenced before assignment",
&name);
return Continue::UNWIND;
}
thread->stackPush(value);
return Continue::NEXT;
}
```
This check is required because:
* Python has function-local scope and variables can be used before they are initialized;
* For simplicity, the bytecode compiler does not check if a variable is initialized at its use;
* Variables can also be removed from scope with `del`
For example,
```python
def foo(x):
return x
```
Even though `x` is clearly defined at its use, the interpreter still (before this commit) checks.
Instead of doing this check on every local load, augment the compiler with a definite assignment analysis pass. Remove the interpreter-rewrite solution that only handled parameters and in functions without `del`.
Add tests.
Add and modify tests.
Fix #428
`function.__new__` needs globals to be a dict (or, for us, a module
proxy would work too). The function was written in D20167616 and even
had a test for `function.__new__` with a non-dict globals, but this
relied on undefined behavior inside the runtime: frames used to be
`memset` to help `LOAD_FAST` check if a value was set, but this was
showing up in benchmarks. So instead we relaxed the constraint of
raising an `UnboundLocalError` and moved on with our lives. Now that we
are finally fixing this, we came across the bug of a conditionally
defined `mod`.
Finally, finally, finally, fix the bug! :)
This fixes references to _compiler_opcode.opcode.SOME_OPCODE if it's a
synonym. Previously _compiler_opcode.opcode.BINARY_SUBSCR_ANAMORPHIC
would refer to BINARY_SUBSCR.
This is important for control flow analysis.
This de-clutters the comment a bit.
This will be useful when adding LOAD_FAST optimization.
This is causing tests to fail. Investigate more later.
This reverts commit 8519e1fc2235d992eedcf5568f8adc60281c58f3.
This doesn't fix any particular problem right now, but I came across it.
This is definitely also used somewhere else in the project and I do not
remember where.
GCC is throwing errors about unknown pragmas.
This should hopefully be elided by the compiler and not be a perf
regression.
Also buffer stdout until there is an error.
We rely on stdout for output; don't clutter it with messages.
Use memcpy instead.
Calling methods on invalid objects (0x23, etc) is undefined behavior and
causing crashes.
Fix #424
This makes CI easier to test without stacking.
This can help measure attribute lookups of simple @property functions.
On some compilers, the warnings below get generated. I think it's
something to do with not defining `_POSIX_C_SOURCE` when we include
`time.h` but that didn't work--so it might be a project-level setting
that we need to tweak.
In the meantime, forward declare the structs in `cpython-types.h`.
```
In file included from /home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-data.h:5,
from /home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/Python.h:15,
from signature.c:1:
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:1311:58: warning: ‘struct timespec’ declared inside parameter list will not be visible outside of this definition or declaration
1311 | PyAPI_FUNC_DECL(int _PyTime_AsTimespec(_PyTime_t, struct timespec*));
| ^~~~~~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:21:70: note: in definition of macro ‘PyAPI_FUNC_DECL’
21 | #define PyAPI_FUNC_DECL(DECL) __attribute__((visibility("default"))) DECL
| ^~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:1312:57: warning: ‘struct timeval’ declared inside parameter list will not be visible outside of this definition or declaration
1312 | PyAPI_FUNC_DECL(int _PyTime_AsTimeval(_PyTime_t, struct timeval*,
| ^~~~~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:21:70: note: in definition of macro ‘PyAPI_FUNC_DECL’
21 | #define PyAPI_FUNC_DECL(DECL) __attribute__((visibility("default"))) DECL
| ^~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:1316:65: warning: ‘struct timeval’ declared inside parameter list will not be visible outside of this definition or declaration
1316 | PyAPI_FUNC_DECL(int _PyTime_AsTimeval_noraise(_PyTime_t, struct timeval*,
| ^~~~~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:21:70: note: in definition of macro ‘PyAPI_FUNC_DECL’
21 | #define PyAPI_FUNC_DECL(DECL) __attribute__((visibility("default"))) DECL
| ^~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:1325:64: warning: ‘struct timespec’ declared inside parameter list will not be visible outside of this definition or declaration
1325 | PyAPI_FUNC_DECL(int _PyTime_FromTimespec(_PyTime_t* tp, struct timespec* ts));
| ^~~~~~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:21:70: note: in definition of macro ‘PyAPI_FUNC_DECL’
21 | #define PyAPI_FUNC_DECL(DECL) __attribute__((visibility("default"))) DECL
| ^~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:1326:63: warning: ‘struct timeval’ declared inside parameter list will not be visible outside of this definition or declaration
1326 | PyAPI_FUNC_DECL(int _PyTime_FromTimeval(_PyTime_t* tp, struct timeval* tv));
| ^~~~~~~
/home/max/Documents/code/skybison/build-relwithdebinfo/include/skybison3.8/cpython-func.h:21:70: note: in definition of macro ‘PyAPI_FUNC_DECL’
21 | #define PyAPI_FUNC_DECL(DECL) __attribute__((visibility("default"))) DECL
| ^~~~
```
Subclasses in the wild write `@property` functions that shadow the
attributes. Since they are getter only, the builtin attribute setting
fails with an `AttributeError`:
```
======================================================================
ERROR: test_set_path_attr_on_import_error_subclass_with_property_sets_attr (__main__.ExceptionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "library/builtins_test.py", line 4968, in test_set_path_attr_on_import_error_subclass_with_property_sets_attr
c = C("a path")
File "library/builtins.py", line 1037, in __init__
self.path = path
AttributeError: can't set attribute
```
If we use `_instance_setattr`, this works.
This may be a more general problem since CPython does all this stuff in
C and our builtins are all in Python, but I am not sure how to approach
that yet.
If we can avoid it, don't fully `strFormat` a string and set an
`AttributeError` on the `Thread`. There's no need; we're immediately
going to clear it.
To signal that we should not raise an `AttributeError`, set the initial
`location_out` to `Unbound::object()` (instead of the normal call path,
which sets it to `NoneType::object()`).
If we can avoid it, don't fully `strFormat` a string and set an
`AttributeError` on the `Thread`. There's no need; we're immediately
going to clear it.
To signal that we should not raise an `AttributeError`, set the initial
`location_out` to `Unbound::object()` (instead of the normal call path,
which sets it to `NoneType::object()`).
Don't make a monomorphic cache on failure, though; just look up the
method as usual.
Without asm implementation:
```
"nqueens": {
"benchmark": "nqueens",
"cg_instructions before": 3293749025,
"cg_instructions now": 3207631962,
"cg_instructions ∆": "-2.6%",
"interpreter_name": "pyro",
"version before": "80d8286edd5f6f19db0d5177b6490b89d9133765",
"version now": "e0f0577d47ff5986029f885e5b502732238a3126"
},
```
With asm implementation:
```
"nqueens": {
"benchmark": "nqueens",
"cg_instructions before": 3293749025,
"cg_instructions now": 3199172980,
"cg_instructions ∆": "-2.9%",
"interpreter_name": "pyro",
"version before": "80d8286edd5f6f19db0d5177b6490b89d9133765",
"version now": "e49774d6da690b979a5ab645f53ab31fd1debde5"
},
```
0.3%. Not bad.
The function is not supposed to take **kwargs*; it should raise a
TypeError. This also speeds up the creation of reversed() objects by not
allocating an empty dict with every call.
Add tests.
```json
{
"django_minimal_requests": {
"benchmark": "django_minimal_requests",
"cg_instructions before": 694510,
"cg_instructions now": 693934,
"cg_instructions ∆": "-0.1%",
"interpreter_args": [],
"interpreter_name": "pyro",
"version before": "cd35adcc708c74149e7d2f9ec197d3c1e46fed4a",
"version now": "f88097c5045a760ab9494a43cbbe897392376cb1"
}
}
```
Local variables are implemented with `LOAD_FAST`/`STORE_FAST`/`DELETE_FAST`.
The implementation of `LOAD_FAST` looks like this:
```c++
HANDLER_INLINE Continue Interpreter::doLoadFast(Thread* thread, word arg) {
Frame* frame = thread->currentFrame();
RawObject value = frame->local(arg);
if (UNLIKELY(value.isErrorNotFound())) {
HandleScope scope(thread);
Str name(&scope, Tuple::cast(Code::cast(frame->code()).varnames()).at(arg));
thread->raiseWithFmt(LayoutId::kUnboundLocalError,
"local variable '%S' referenced before assignment",
&name);
return Continue::UNWIND;
}
thread->stackPush(value);
return Continue::NEXT;
}
```
This check is required because:
* Python has function-local scope and variables can be used before they are initialized;
* For simplicity, the bytecode compiler does not check if a variable is initialized at its use;
* Variables can also be removed from scope with `del`
For example,
```python
def foo(x):
return x
```
Even though `x` is clearly defined at its use, the interpreter still (before this commit) checks.
Instead of doing this check on every local load, augment the compiler with a definite assignment analysis pass. Remove the interpreter-rewrite solution that only handled parameters and in functions without `del`.
Add tests.
`function.__new__` needs globals to be a dict (or, for us, a module
proxy would work too). The function was written in D20167616 and even
had a test for `function.__new__` with a non-dict globals, but this
relied on undefined behavior inside the runtime: frames used to be
`memset` to help `LOAD_FAST` check if a value was set, but this was
showing up in benchmarks. So instead we relaxed the constraint of
raising an `UnboundLocalError` and moved on with our lives. Now that we
are finally fixing this, we came across the bug of a conditionally
defined `mod`.
Finally, finally, finally, fix the bug! :)