Skip to content

Superoptimize sequences of instructions without observable side-effects in the bytecode compiler. #102869

Open
@markshannon

Description

@markshannon

Sequences of LOAD_FAST, COPY, LOAD_CONST, SWAP, POP_TOP, NOP (and other?) instructions have no observable side-effects as the evaluation stack cannot be observed.
Therefore we can replace any sequence of these instructions with a shorter sequence that has the same effect.
By using a superoptimizer we replace sequences with the optimal equivalent sequence.

In practice we will want to limit the size of lookup tables we generate, but we can generate complete tables for sequences up to 4 or 5.

There is a slight complication of line numbers, but as long the number of line numbers does not exceed the optimal sequence length, we can just allocate instructions line numbers without regard to the original mapping of instructions to line numbers.

Generating and using tables.

During build (or offline), we can generate a table mapping sequences of instructions to the stack, recording the optimum sequence for each stack.
E.g. The sequences LOAD_FAST x; LOAD_FAST y; SWAP 2 and LOAD_FAST y; LOAD_FAST x generate the stack [ y, x ].
So the lookup table maps [ y, x ] to LOAD_FAST y; LOAD_FAST x.

Whenever we see a sequence of instructions without observable side effects, we compute the stack, and look it up. If we find a match, emit the optimal sequence.

Mimimizing the size of tables

We will need to use some sort of numbering scheme for the indexes of LOAD_FAST and LOAD_CONST, and treat LOAD_CONST the same as LOAD_FAST.

So that LOAD_FAST a; LOAD_CONST 1.0 becomes LOAD 1; LOAD 2.

In order to keep the table size down, we should limit the number of distinct locals and constants to 4, and the depth of the stack to something like 7. That way there are only a few thousand possible stacks, and the optimal sequence for each cannot be more than 4 instructions. The entire table should then fit into tens of kbytes of const data.

Handling line numbers.

Since the instructions have no visible side-effects, we can place the line events anywhere in the instruction sequence.
Taking the above sequence with line numbers:

4 LOAD_FAST x
  LOAD_FAST y
5 SWAP 2

We can generate:

4 LOAD_FAST y
5 LOAD_FAST x

which is valid as it produces the same sequence of line events and the observable state of the VM is the same in both cases.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usage

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions