Skip to content

Commit 980f2d7

Browse files
committed
blah
1 parent 14deba3 commit 980f2d7

File tree

3 files changed

+289
-12
lines changed

3 files changed

+289
-12
lines changed

conditionals.rkt

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@
128128
(append* (map (collect-locals) els)))]
129129
[else ((super collect-locals) ast)])))
130130

131-
(define optimize-if #t)
131+
(define optimize-if #f)
132132

133133
(define/public (flatten-if new-thn thn-ss new-els els-ss)
134134
(lambda (cnd)
@@ -163,16 +163,16 @@
163163
,(append thn-ss (list thn-ret))
164164
,(append els-ss (list els-ret))))))]
165165
[else
166-
(let-values ([(new-cnd cnd-ss)
167-
((flatten #t) `(has-type ,cnd ,t))])
168-
(define tmp (gensym 'if))
169-
(define thn-ret `(assign ,tmp ,new-thn))
170-
(define els-ret `(assign ,tmp ,new-els))
171-
(values `(has-type ,tmp ,t)
172-
(append cnd-ss
173-
`((if (eq? (has-type #t Boolean) ,new-cnd)
174-
,(append thn-ss (list thn-ret))
175-
,(append els-ss (list els-ret)))))))])]
166+
(define-values (new-cnd cnd-ss) ((flatten #t)
167+
`(has-type ,cnd ,t)))
168+
(define tmp (gensym 'if))
169+
(define thn-ret `(assign ,tmp ,new-thn))
170+
(define els-ret `(assign ,tmp ,new-els))
171+
(values `(has-type ,tmp ,t)
172+
(append cnd-ss
173+
`((if (eq? (has-type #t Boolean) ,new-cnd)
174+
,(append thn-ss (list thn-ret))
175+
,(append els-ss (list els-ret))))))])]
176176
[other (error 'flatten-if "unmatched ~a" other)])))
177177

178178
(define/override (flatten need-atomic)

schedule.txt

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,13 @@ March
2626
April
2727
---------------------------------------------
2828
4 due: Type Dynamic, Dynamic Typing (2 weeks)
29-
Lecture topic: ??
29+
Lecture topic: expose basic blocks
30+
31+
6 Lecture topic: objects
32+
33+
11 Lecture topic: tail call opt. (Michael)
34+
13 Lecture topic: set! (Chris)
35+
3036

3137

3238

@@ -49,6 +55,8 @@ Student Projects
4955
* alternative register allocator
5056
* parametric polymorphism
5157
* type classes (perhaps too difficulty?)
58+
* loops and loop optimization
59+
* object-oriented features
5260

5361

5462
25 projects due

tail-call-lecture.txt

Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
Lecture noted addendum: The one thing we discussed that's not in the
2+
lecture notes is how to do stack arguments for tail calls: it turns
3+
out that the recursive caller has to write the stack arguments into
4+
its caller's stack argument zone. This gets hairier when you have
5+
non-self-recursive tail calls, because then the original caller needs
6+
to allocate stack space for arguments that it never writes to: for
7+
example, if A non-tail-calls B (which has 3 arguments), and B
8+
tail-calls C (which has 10 arguments), A has to allocate stack space
9+
for 4 stack arguments even though it itself never calls anything that
10+
uses stack arguments.
11+
12+
13+
Today I wanted to talk about one important optimization in compilers:
14+
tail call optimization.
15+
16+
This might end up being a pretty quick lecture because the
17+
optimization itself is pretty straightforward, but this is an
18+
optimization important enough that it's actually explicit in the
19+
Racket specification: you don't technically have a "Racket
20+
implementation" until you have this. To see what this is about, let's
21+
take a look at the "same" program written in two languages. First of
22+
all, Racket:
23+
24+
[pull up racket_infinite_loop.rkt]
25+
(define (f)
26+
(f))
27+
(f)
28+
29+
So, this is obviously an infinite loop, right? That's cool, we're down
30+
with infinite loops. Let's run this program. [do so] And look at
31+
that, it's looping infinitely.
32+
33+
Now, compare this with another program, that looks identical and
34+
should behave identical in theory, except this time, it's written in
35+
Python.
36+
37+
[pull up python_infinite_loop.py]
38+
def f():
39+
f()
40+
f()
41+
42+
Let's run this version. [do so] And this time, the program
43+
explodes. The difference is that while Racket performs tail call
44+
optimization, Python somewhat famously doesn't. And while we might not
45+
be too concerned about this program in particular, there are lots of
46+
functions that are invoked recursively many times before eventually
47+
returning a result --- we don't want to have to worry bout our program
48+
dying if a function loops more than an arbitrary number of times.
49+
50+
So specifically, the thing that causes Python to die here is that
51+
every time a recursive call is that every time the function is
52+
recursively applied, a new stack frame is allocated, and eventually,
53+
the stack just takes up so much space that the execution is
54+
automatically halted. Python, of course, is an interpreted language
55+
(as is regular Racket), but basically the same principle holds in our
56+
compiled code: every time a recursive call happens, the prelude of of
57+
the program is going to push all the caller-save registers and move
58+
the stack pointers to create enough space for spilled
59+
variables. Eventually, this is going to result in a segfault as the
60+
stack space used by the program exceeds that which is provided to it
61+
by the OS (or at least that's my understanding).
62+
63+
The answer that tail call elimination provides is that in some
64+
circumstances we can simply reuse the current stack frame. We can only
65+
do this if the recursive call that would be generating the new frame
66+
is the very last thing that happens in this particular iteration of
67+
the function call. So, given this program, we can achieve efficiency
68+
by changing the callq instruction generated for the recursive call
69+
into a straight, unconditional jump the the beginning of the procedure
70+
code. There's some subtleties that I'll get into, but that's really
71+
the core of it.
72+
73+
To implement this, the first step is to mark which recursive calls can
74+
be eliminated. I reccomend doing this as part of reveal-functions,
75+
because when the program is still in Racket form it's very clear which
76+
calls are in tail position and which aren't. For now, we're only
77+
talking about eliminating recursive tail calls, so only calls inside
78+
(define)'d functions can be eliminated
79+
80+
So given a program like this:
81+
82+
(define (times n m)
83+
(if (eq? n 0)
84+
0
85+
(+ m (times (- n 1) m))))
86+
87+
we will do our normal revealing of functions:
88+
89+
(define (times n m)
90+
(if (eq? n 0)
91+
0
92+
(+ m (app (function-ref times) (- n 1) m))))
93+
94+
since the addition happens after the recursive call, we can't reuse
95+
the stack frame for the call: there are still things happening within
96+
the current frame after the call returns. But if we change the program like so:
97+
98+
(define (times-iter n m prod)
99+
(if (eq? n 0)
100+
c
101+
(times-iter (- n 1) m (+ prod m))))
102+
103+
we can see that the recursive call is at the end of one path through
104+
the function, so we can change it to
105+
106+
(define (times-iter n m prod)
107+
(if (eq? n 0)
108+
c
109+
(tailcall (function-ref times-iter) (- n 1) m (+ prod m))))
110+
111+
The downside of introducing a new AST form this early, of course, is
112+
that we have to propagate it all the way through the rest of our
113+
passes. Once we get into the C language, I reccomend treating tailcall
114+
as a new statement, just like assign and return. This is because it
115+
really is more like the return statement than anything else: it's not
116+
going to write a result to the LHS, like an assign would, its just
117+
going to do a jump and then relinquish control over returning the
118+
result to the callee. You'll have to propagate this statement through
119+
all the C passes, but it's very straightforward to do so, with one
120+
exception.
121+
122+
The one pass that is interesting among the C passes is
123+
uncover-call-live-roots. Remember, one of the things that this pass
124+
does is, when it sees a function call:
125+
126+
(assign lhs (app (function-ref f) y z))
127+
128+
Given the set '(somevector someclosure) of live heap values, we have
129+
to compile this to
130+
131+
(call-live-roots (somevector someclosure) (assign lhs (app (function-ref f) y z)))
132+
133+
And this will in turn get compiled into instructions that push
134+
somevector and someclosure onto the root stack and then pop them off
135+
afterwards, so that they dont get garbage collected while the
136+
recursive call is executing.
137+
138+
But this seems like a really bad thing for tail calls, right? We can't
139+
do anything after a tail call, because we've wiped out this stack
140+
frame and we'll never return to it.
141+
142+
QUESTION: what's the solution here?
143+
144+
ANSWER: Actually, there isn't a problem at all! Like I said, we're
145+
never returning to this stack frame, so everything in the frame is
146+
dead at the point that we make the tail call, except for the arguments
147+
to the function. And the callee won't collect those if they're alive
148+
when it executes. So in fact, we don't need to insert a
149+
call-live-roots around tailcalls.
150+
151+
When I said that this pass was interesting, I meant that it's
152+
interesting for what we _don't_ have to do, rather than what we do.
153+
154+
Now, like regular apps, tailcalls go away when we perform instruction
155+
selection. Recall that when we did instruction selection on regular
156+
apps, we turned them into "indirect callq"s
157+
158+
(indirect-callq (reg rax))
159+
160+
which eventually we print out as
161+
162+
callq *%rax
163+
164+
It shouldn't be surprising that there's an analagous form for indirect
165+
jumps: we'll introduce
166+
167+
(indirect-jmp (reg rax))
168+
169+
into our pseudo-x86 language and then print it out as
170+
171+
jmp *%rax
172+
173+
So our times function will end up looking something like this
174+
175+
176+
.globl times
177+
times:
178+
push %rbp
179+
movq %rsp, %rbp
180+
pushq %r14
181+
pushq %r13
182+
pushq %r12
183+
pushq %rbx
184+
subq $16, %rsp
185+
186+
movq %rdi, n
187+
movq %rsi, m
188+
mocq %rdx, prod
189+
190+
... do stuff ...
191+
192+
leaq times(%rip), %r12
193+
... closure stuff ...
194+
195+
movq n, %rdi
196+
movq m, %rsi
197+
movq prod, %rdx
198+
jmp *%r12
199+
200+
... other cases ...
201+
202+
And then we're cool, right? Except ---
203+
204+
QUESTION: Can anybody spot what the problem is here?
205+
206+
ANSWER: We're jumping to _before_ the function prelude, so for every
207+
iteration, we're still doing some of the work to allocate a new frame:
208+
pushing the base pointer, allocating stack space, etc. That's bad!
209+
210+
The way that I solved this, which perhaps isn't the cleverest, is to
211+
introduce a new label that marks the end of the prelude and the
212+
beginning of the function's body. I then used that as the indirect-jmp
213+
target:
214+
215+
.globl times
216+
times:
217+
... prelude ...
218+
219+
times_body:
220+
movq %rdi, n
221+
movq %rsi, m
222+
mocq %rdx, prod
223+
224+
... do stuff ...
225+
226+
leaq times_body(%rip), %r12
227+
... stuff ...
228+
229+
Then allllll the way back in the modified version of reveal-functions,
230+
I change function-refs within tailcalls to append "body" to the target
231+
label. Maybe a better way to do this, though is to make sure that the
232+
prelude is always of constant length and then jump to an offset from
233+
the function entry label.
234+
235+
So, that's the story for recursive tail calls. What about tail calls
236+
to other functions, or mutually recursive tail calls? For example,
237+
238+
(define (odd n)
239+
(if (eq? n 0)
240+
#f
241+
(even (- n 1))))
242+
(define (even n)
243+
(if (eq? n 0)
244+
#t
245+
(odd (- n 1))))
246+
247+
The recursive calls to even and odd are in tail position, so we want
248+
to eliminate them too. We can do so, but in general this requires a
249+
bit of care, because now we are going to reuse the same stack frame
250+
for calls to different functions. So when we select instructions for a
251+
function with tail calls to other functions, we need to make sure that
252+
the frame has enough space for the spilled locals and arguments for
253+
_every_ function that can get tailcalled, transitively: if function A
254+
tailcalls function B which tailcalls function C, then function A's
255+
stack frame needs to be the max of what A,B, and C alone would need.
256+
257+
Alternatively, we can roll back the stack space and jump to after
258+
saving the locals but before allocating stack space.
259+
260+
This also means that we have to have this information to perform a
261+
tailcall; if we don't have it, we have to fall back to normal calls. For example,
262+
263+
(define (foo fn)
264+
(fn 42))
265+
266+
The call to fn is in tail position here, but fn is an arbitrary
267+
function value, and we have no way of knowing how much stack space is
268+
needed for it. For that reason, we have to fall back to regular calls.
269+

0 commit comments

Comments
 (0)