Compilers
Table of Contents
1. Policy
Attend lectures only if you want to. You are expected to keep up to date on the discussions that happen during lectures and labs.
1.1. Evaluation
Assessment | Percentage | Remarks |
---|---|---|
Lab quizzes | 30 | Small programming exercises |
Semester-long project | 50 | Team of 4. |
Mid-semester exam | 10 | |
End-semester exam | 10 |
1.2. Resources
There is no textbook. Follow the lectures. Code a lot. Read necessary theory.
Bob Nystrom's Crafting Interpreters is a good book. I will disagree with this book on some of the topics.
2. Lectures
2.1. Calculator back-end
You have all used compilers and you know what they do. In this course, we will build one and explore how compilers are organized and the relevant algorithms.
The most common data structure in a compiler is a tree and the most
common algorithm, the post-order traversal. To see why, let us build a
simple calculator. It accepts expressions and evaluates them. What is
the best way to represent an expression like 2 + 3*5
in a computer?
+ / \ 2 * / \ 3 5
Yes. A tree. How would you evaluate this expression? Evaluate the left sub-tree, then the right sub-tree, then add the results. That is, a post-order traversal.
The front-end of the compiler has to take the string 2 + 3*5
and
build this tree. This is done using two classes of algorihms called
lexing and parsing, which correspond to DFAs and CFGs respectively. We
will see this part later. First, we will build the back-end for the
compiler. It takes trees as input and evaluates them.
First, let us define some classes in Python to represent expression trees.
from dataclasses import dataclass class AST: pass @dataclass class BinOp(AST): op: str left: AST right: AST @dataclass class Number(AST): val: str
The tree for the previous expression can be created so:
expr_t1 = BinOp("+", Number("2"), BinOp("*", Number("3"), Number("5"))) print(expr_t1)
BinOp(op='+', left=Number(val='2'), right=BinOp(op='*', left=Number(val='3'), right=Number(val='5')))
Now, let us write an evaluator. This function should take the tree as input and produce a number.
def e(tree: AST) -> int: match tree: case Number(v): return int(v) case BinOp("+", l, r): return e(l) + e(r) case BinOp("*", l, r): return e(l) * e(r)
Let us test it by evaluating our sample expression.
e(expr_t1)
17
Yay! Now, extend this small program to make what you think a real calculator should be.
2.2. Calculator front-end
The front-end of the calculator should take a string as input and return the corresponding expression tree. We split this into a lexer and a parser.
The lexer converts a stream of characters into a stream of tokens. First, we define the tokens of our language.
class Token: pass @dataclass class NumberToken(Token): v: str @dataclass class OperatorToken(Token): o: str
Now, the lexer can produce the token stream.
from collections.abc import Iterator def lex(s: str) -> Iterator[Token]: i = 0 while True: while i < len(s) and s[i].isspace(): i = i + 1 if i >= len(s): return if s[i].isdigit(): t = s[i] i = i + 1 while i < len(s) and s[i].isdigit(): t = t + s[i] i = i + 1 yield NumberToken(t) else: match t := s[i]: case '+' | '*': i = i + 1 yield OperatorToken(t)
Let us test our lexer with our sample expression.
for t in lex("2 + 3*5"): print(t)
NumberToken(v='2') OperatorToken(o='+') NumberToken(v='3') OperatorToken(o='*') NumberToken(v='5')
The parser takes the token stream as input and produces the AST.
from collections.abc import Iterator def parse(s: str) -> AST: from more_itertools import peekable t = peekable(lex(s)) def parse_add(): ast = parse_mul() while True: match t.peek(None): case OperatorToken('+'): next(t) ast = BinOp('+', ast, parse_mul()) case _: return ast def parse_mul(): ast = parse_atom() while True: match t.peek(None): case OperatorToken('*'): next(t) ast = BinOp("*", ast, parse_atom()) case _: return ast def parse_atom(): match t.peek(None): case NumberToken(v): next(t) return Number(v) return parse_add()
Let us test the parser for some expressions.
print(parse("2")) print(parse("2+3")) print(parse("2+3*5"))
Number(val='2') BinOp(op='+', left=Number(val='2'), right=Number(val='3')) BinOp(op='+', left=Number(val='2'), right=BinOp(op='*', left=Number(val='3'), right=Number(val='5')))
To put the front-end and back-end together, we just have to compose
parse
and e
.
print(e(parse("2 + 3*5")))
17
Now is the time to extend this into a more useful calculator. I suggest adding the following features.
- All usual arithmetic operators, including the unary negation.
- Parenthesis for grouping.
- Use a proper number type for evaluation.
- Error-handling in the front-end.
2.3. Conditional expressions
We will implement a conditional expression of the form if a then b
else c end
. To do this, we need the following additions:
- Lexing: Add support for keywords like
if
. - Parsing: Add support for conditional expressions. This has a lower precedence than arithmetic.
- Evaluation: Modify the post-order traversal to conditionally traverse sub-trees.
As before, we will start by defining the AST node.
@dataclass class If(AST): c: AST t: AST e: AST
We will use BinOp
for comparison operators.
We want to handle expressions like:
expr_t2 = If(BinOp("<", Number("2"), Number("3")), Number("0"), Number("1")) expr_t3 = If(BinOp("<", Number("3"), Number("2")), Number("0"), Number("1"))
The evaluator is:
def e(tree: AST) -> int | bool: match tree: case Number(v): return int(v) case BinOp("+", l, r): return e(l) + e(r) case BinOp("*", l, r): return e(l) * e(r) case BinOp("<", l, r): return e(l) < e(r) case If(cond, then, else_): if e(cond): return e(then) else: return e(else_)
Let us test it with our sample expression.
print(e(expr_t2)) print(e(expr_t3))
0 1
Now, we will modify the front-end to allow the programmer to input these expressions. We will need a token type for keywords.
@dataclass class KeywordToken(Token): w: str
In lex
, if the first character is a letter, then we will try and
form a keyword token. We should also handle comparison operators.
from collections.abc import Iterator def lex(s: str) -> Iterator[Token]: i = 0 while True: while i < len(s) and s[i].isspace(): i = i + 1 if i >= len(s): return if s[i].isalpha(): t = s[i] i = i + 1 while i < len(s) and s[i].isalpha(): t = t + s[i] i = i + 1 # XXX: Should check here whether we got a valid keyword. yield KeywordToken(t) elif s[i].isdigit(): t = s[i] i = i + 1 while i < len(s) and s[i].isdigit(): t = t + s[i] i = i + 1 yield NumberToken(t) else: match t := s[i]: case '+' | '*' | '<': i = i + 1 yield OperatorToken(t)
Let us see what tokens are produced by our new lexer.
for t in lex("if 2 < 3 then 0 else 1 end"): print(t)
KeywordToken(w='if') NumberToken(v='2') OperatorToken(o='<') NumberToken(v='3') KeywordToken(w='then') NumberToken(v='0') KeywordToken(w='else') NumberToken(v='1') KeywordToken(w='end')
It looks fine. Now, we move on to parsing.
While parsing, we will peek at the first token. If it is the keyword
if
, then we will try to parse an if
expression. If it begins with
another token, we will try and parse an arithmetic expression.
class ParseError(Exception): pass def parse(s: str) -> AST: from more_itertools import peekable t = peekable(lex(s)) def expect(what: Token): if t.peek(None) == what: next(t) return raise ParseError def parse_if(): match t.peek(None): case KeywordToken("if"): next(t) cond = parse_if() expect(KeywordToken("then")) then = parse_if() expect(KeywordToken("else")) else_ = parse_if() expect(KeywordToken("end")) return If(cond, then, else_) case _: return parse_cmp() def parse_cmp(): l = parse_add() if t.peek(None) == OperatorToken('<'): next(t) r = parse_add() return BinOp('<', l, r) else: return l def parse_add(): ast = parse_mul() while True: match t.peek(None): case OperatorToken('+'): next(t) ast = BinOp('+', ast, parse_mul()) case _: return ast def parse_mul(): ast = parse_atom() while True: match t.peek(None): case OperatorToken('*'): next(t) ast = BinOp("*", ast, parse_atom()) case _: return ast def parse_atom(): match t.peek(None): case NumberToken(v): next(t) return Number(v) return parse_if()
Let us test it on a sample.
print(parse("if 2 < 3 then 0+5 else 1*6 end"))
If(c=BinOp(op='<', left=Number(val='2'), right=Number(val='3')), t=BinOp(op='+', left=Number(val='0'), right=Number(val='5')), e=BinOp(op='*', left=Number(val='1'), right=Number(val='6')))
2.4. Variables
Consider the following emacs lisp example:
(defvar v 1) (defun foo () (princ v)) (defun bar () (let ((v 10)) (foo))) (bar)
10
Compare it with the following Python code that is the same except in the scoping of variables.
v = 1 def foo(): print(v) def bar(): v = 10 foo() bar()
1
The v
in foo()
refers to the global v
in Python. The v
in
bar()
is local to that function and cannot escape unless it is
passed as an argument to foo()
. This is called lexical scoping. The
association of variable use to declaration is determined by program
text. Emacs lisp uses dynamic scoping, where the association is
determined by the state of execution.
(defvar v 1) (defun foo () (print v)) (defun bar (a) (if a (let ((v 10)) (foo)) (foo))) (bar nil) (bar t)
1 10
In the following example, observe that foo()
picks up either the
global v
or the v
local to bar()
depending on the argument to
bar()
.
1 10
Modern languages all use lexical scoping by default. But, an option to have dynamic variables is also nice.
2.5. Dynamic scoping
Implementing dynamic scoping is easy. During evaluation, we keep a stack that maps variable names to values. Each variable declaration pushes an entry into this stack and each variable use looks for the variable in this stack from top to bottom. This will pick-up the most recently declared variable of the same name.
If your language has no functions, then there is no difference between lexical and dynamic scoping.
The following class represents the programming construct let v be e
in f end
which binds the value e
to v
and then evaluates
f
. Here, v
must be a valid identifier and e
and f
must be
expressions.
@dataclass class Let(AST): v: str e: AST f: AST
We also need a class for occurences of variables within expressions.
@dataclass class Var(AST): v: str
The evaluation now requires a stack in addition to the AST. Usually, this stack is called the environment.
def lookup(env, v): for u, uv in reversed(env): if u == v: return uv raise ValueError("No value found.") def e(t: AST, env = None) -> int: if env is None: env = [] match t: case Var(v): return lookup(env, v) case Let(v, x, y): vv = e(x, env) env.append((v, vv)) yv = e(y, env) env.pop() return yv case Number(s): return int(s) case BinOp("+", l, r): return e(l, env) + e(r, env) assert False
Let us create two sample ASTs and test them.
expr_t4 = Let("a", Number("3"), BinOp("+", Var("a"), Var("a"))) expr_t5 = Let("a", Number("3"), Let("b", BinOp("+", Var("a"), Number("2")), BinOp("+", Var("a"), Var("b"))))
print(e(expr_t4)) print(e(expr_t5))
6 8
Now, we will add the surface syntax which will look like let a be 3
in a + a end
. I prefer keywords such as be
instead of symbols such
as =
or :=
as words are easier to type for me. However, when
choosing a keyword, we have to ensure that it is a word that will not
be used often as variable names. The word be
is not a common
variable name. So this choice is fine.
The lexer should have variable tokens.
@dataclass class VarToken(Token): v: str
After lexing a word, we have to lookup whether it is a keyword. If it is, then we return a keyword token. Otherwise, it is a variable.
from collections.abc import Iterator def lex(s: str) -> Iterator[Token]: i = 0 while True: while i < len(s) and s[i].isspace(): i = i + 1 if i >= len(s): return if s[i].isalpha(): t = s[i] i = i + 1 while i < len(s) and s[i].isalpha(): t = t + s[i] i = i + 1 if t in { "let", "be", "in", "end" }: yield KeywordToken(t) else: yield VarToken(t) elif s[i].isdigit(): t = s[i] i = i + 1 while i < len(s) and s[i].isdigit(): t = t + s[i] i = i + 1 yield NumberToken(t) else: match t := s[i]: case '+' | '*' | '<': i = i + 1 yield OperatorToken(t)
Let us test the lexer.
for t in lex("let a be 3 in a + a end"): print(t)
KeywordToken(w='let') VarToken(v='a') KeywordToken(w='be') NumberToken(v='3') KeywordToken(w='in') VarToken(v='a') OperatorToken(o='+') VarToken(v='a') KeywordToken(w='end')
The parser tries to parse a let
expression if it observes the token
let
at the beginning.
class ParseError(Exception): msg: str def parse(s: str) -> AST: from more_itertools import peekable t = peekable(lex(s)) def expect(what: Token): if t.peek(None) == what: next(t) return raise ParseError(f"Expected {what}") def parse_let(): match t.peek(None): case KeywordToken("let"): next(t) vt = next(t) expect(KeywordToken("be")) e = parse_let() expect(KeywordToken("in")) f = parse_let() expect(KeywordToken("end")) return Let(vt.v, e, f) case _: return parse_add() def parse_add(): ast = parse_atom() while True: match t.peek(None): case OperatorToken('+'): next(t) ast = BinOp('+', ast, parse_atom()) case _: return ast def parse_atom(): match t.peek(None): case NumberToken(v): next(t) return Number(v) case VarToken(v): next(t) return Var(v) return parse_let()
print(e(parse("let a be 3 in a + a end"))) print(e(parse("let a be 3 in let b be a + 2 in a + b end end")))
6 8
2.6. Functions
Let us implement functions. We need two new AST nodes. One for defining and another for calling functions.
@dataclass class Fun(AST): n: str # Name of function a: str # Parameter b: AST # Body e: AST # Function calls here @dataclass class Call(AST): n: str # Name of function a: AST # Argument
We want to write code such as:
expr_t6 = """ fun dbl(a) is a + a in dbl(2) + dbl(3) end """
to define a function and call it two times with two different arguments. It corresponds to the AST:
expr_t6ast = Fun ( "dbl", "a", BinOp("+", Var("a"), Var("a")), BinOp("+", Call("dbl", Number("2")), Call("dbl", Number("3"))) )
First, the evaluation. When a function is defined, we parse and put its body into an environment. When it is called, we evaluate the body in an environment where the parameter is bound to the argument.
def lookup(env, v): for u, uv in reversed(env): if u == v: return uv raise ValueError("No value found.") def e(t: AST, env = None) -> int: if env is None: env = [] match t: case Let(v, x, y): vv = e(x, env) env.append((v, vv)) yv = e(y, env) env.pop() return yv case Var(v): return lookup(env, v) case Fun(f, a, b, c): env.append((f, (a, b))) x = e(c, env) env.pop() return x case Call(f, x): a, b = lookup(env, f) env.append((a, e(x, env))) y = e(b, env) env.pop() return y case Number(s): return int(s) case BinOp("+", l, r): return e(l, env) + e(r, env)
Let us test this on the hand-crafted AST.
print(e(expr_t6ast))
10
The modifications to the front-end are straight-forward. It is left as an exercise.
2.7. Functions with static scoping
We want to create an AST for the following code:
let x be 5 in fun f(y) is x in fun g(z) is let x be 6 in f(z) end in g(0) end end end
and demonstrate that it evaluates to 6 with dynamic scoping. With static scoping, it would have evaluated to 5.
To implement static scoping. We add a resolve()
pass that renames
variables so that each variable has a unique name. For this, we
redefine the Variable
class.
@dataclass class Var(AST): v: str i: int = None
where the i
field holds the integer that will help us resolve
variables to bindings.
We redefine some of the AST nodes to store Var
instead of just a
string.
@dataclass class Let(AST): x: AST # Always a Var e: AST f: AST @dataclass class Fun(AST): f: str # Functions are second-class. x: AST # Parameter. b: AST e: AST
The AST for the above expression is:
expr_t7ast = Let ( Var("x"), Number("5"), Fun ( "f", Var("y"), Var("x"), Fun ( "g", Var("z"), Let ( Var("x"), Number("6"), Call("f", Var("x")) ), Call("g", Number("0")))))
Under dynamic scoping, we will get 6
. With static scoping, we should
get 5
.
Now, we make a function to create fresh integers everytime it is
called. This will help us fill in id
of Var
correctly.
def make_fresh(): i = 0 def fresh(): nonlocal i i = i + 1 return i return fresh
Finally, the resolve()
is another post-order traversal over the AST.
def resolve(t: AST, env = None, fresh = None) -> AST: if env is None: env = [] if fresh is None: fresh = make_fresh() match t: case Number(n): return Number(n) case Let(Var(x, _), e, f): er = resolve(e, env, fresh) env.append((x, i := fresh())) fr = resolve(f, env, fresh) env.pop() return Let(Var(x, i), er, fr) case Var(x, _): return Var(x, lookup(env, x)) case Call(f, x): xr = resolve(x, env, fresh) return Call(f, xr) case Fun(f, Var(x, _), b, y): env.append((x, i := fresh())) br = resolve(b, env, fresh) env.pop() yr = resolve(y, env, fresh) return Fun(f, Var(x, i), br, yr)
Observe the AST before resolution.
print(expr_t7ast)
Let(x=Var(v='x', i=None), e=Number(val='5'), f=Fun(f='f', x=Var(v='y', i=None), b=Var(v='x', i=None), e=Fun(f='g', x=Var(v='z', i=None), b=Let(x=Var(v='x', i=None), e=Number(val='6'), f=Call(n='f', a=Var(v='x', i=None))), e=Call(n='g', a=Number(val='0')))))
Every i
field is None
.
Let us now look at the resolved form of our earlier AST.
print(resolve(expr_t7ast))
Let(x=Var(v='x', i=1), e=Number(val='5'), f=Fun(f='f', x=Var(v='y', i=2), b=Var(v='x', i=1), e=Fun(f='g', x=Var(v='z', i=3), b=Let(x=Var(v='x', i=4), e=Number(val='6'), f=Call(n='f', a=Var(v='x', i=4))), e=Call(n='g', a=Number(val='0')))))
Observe how the x
in the body of f
gets matched to the x
introduced by the outermost let
through the i
field in Var
. Now,
we can adjust the e()
to lookup values based on both the variable
name and its id and obtain the correct variable as in static scoping.
2.8. Bytecode generation
Let us generate bytecode for a calculator. First, we have to define bytecode instructions, their format, and write a VM to execute the bytecode. We will use a simple two-byte format:
- First byte is the opcode.
- Second byte is the immediate operand if any.
We will write the bytecode VM in C. Let us define the opcodes.
enum op { HALT = 0, NOP, PUSH, POP, ADD, SUB, MUL, NEG, };
The interpreter evaluates and returns the final value on stack.
#include <stdio.h> #include <stdint.h> int execute(uint8_t *insns) { size_t ip = 0; int operand[100], top = 0; #define PUSH(x) (operand[top++] = x) #define POP(x) (operand[--top]) int l, r; while (1) { switch (insns[ip]) { case HALT: goto end; case NOP: break; case PUSH: PUSH(insns[ip+1]); break; case ADD: r = POP(); l = POP(); PUSH(l+r); break; case SUB: r = POP(); l = POP(); PUSH(l-r); break; case MUL: r = POP(); l = POP(); PUSH(l*r); break; case NEG: l = POP(); PUSH(-l); break; } ip += 2; // No control-flow. } end: return POP(); #undef PUSH #undef POP }
Now, let us test the interpreter with some hand-crafted bytecode.
enum op { HALT = 0, NOP, PUSH, POP, ADD, SUB, MUL, NEG, }; #include <stdio.h> #include <stdint.h> int execute(uint8_t *insns) { size_t ip = 0; int operand[100], top = 0; #define PUSH(x) (operand[top++] = x) #define POP(x) (operand[--top]) int l, r; while (1) { switch (insns[ip]) { case HALT: goto end; case NOP: break; case PUSH: PUSH(insns[ip+1]); break; case ADD: r = POP(); l = POP(); PUSH(l+r); break; case SUB: r = POP(); l = POP(); PUSH(l-r); break; case MUL: r = POP(); l = POP(); PUSH(l*r); break; case NEG: l = POP(); PUSH(-l); break; } ip += 2; // No control-flow. } end: return POP(); #undef PUSH #undef POP } int main() { uint8_t insns[] = { PUSH, 2, PUSH, 3, ADD , 0, // 0 is ignored. PUSH, 5, MUL , 0, HALT, 0, }; printf("%d\n", execute(insns)); return 0; }
25
Now, we just have to take an AST for an expression into an array of bytes and write it out to a file.
HALT, NOP, PUSH, POP, ADD, SUB, MUL, NEG = range(8) def do_codegen(t: AST, code): match t: case Number(v): # Careful. I don't handle large numbers here. code.append(PUSH) code.append(int(v)) case BinOp("+", l, r): do_codegen(l, code) do_codegen(r, code) code.append(ADD) code.append(0) case BinOp("-", l, r): do_codegen(l, code) do_codegen(r, code) code.append(SUB) code.append(0) case BinOp("*", l, r): do_codegen(l, code) do_codegen(r, code) code.append(MUL) code.append(0) return code def codegen(t): c = do_codegen(t, bytearray()) c.append(HALT) return c
Let us create a sample AST and check the output bytes.
expr_cg = BinOp ("*", BinOp("+", Number("2"), Number("3")), Number("5"))
And, finally:
print(codegen(expr_cg))
bytearray(b'\x02\x02\x02\x03\x04\x00\x02\x05\x06\x00\x00')
Connecting all these pieces by appropriate file-handling is left as an exercise.
2.9. Bytecode execution for call and return
We will write bytecode equivalent of:
fun f(x) is x + 1 in f(42) end
which evaluates to 43 to illustrate CALL and RETURN.
The bytecode instructions are:
class OC: JF, PUSH, ADD, CALL, RET, GET, HALT = range(1, 8)
The above code is then compiled into the following bytecode:
sample = [ (OC.JF, 5), # Jump to after the function body. (OC.GET, "x"), (OC.PUSH, 1), (OC.ADD, None), (OC.RET, None), (OC.PUSH, 42), (OC.GET, "f"), (OC.CALL, None), (OC.HALT, None) ]
To simplify discussion, I keep them as Python list. The real bytecode will be a sequence of bytes that encodes the above list.
We will discuss the bytecode evaluator first. The values in stack are either numbers or functions. We need objects corresponding to them.
from dataclasses import dataclass @dataclass class NumObj: n: int def __repr__(self): return f"{self.n}" @dataclass class FunObj: e: int # The entry point a: str # The parameter name
For executing functions, we need call frames.
@dataclass class CallFrame: f: FunObj # The function object v: dict # arguments and locals r: int # The return address
After compiling the above bytecode, the compiler should create the following function object to encode f.
f = FunObj(1, "x")
The code generator should create both the bytecode and a dictionary that maps names to corresponding global objects.
@dataclass class Code: b: list # The bytecode o: dict # The map
For our sample program:
code = Code(sample, { "f" : f })
Now, the function that actually executes the bytecode. We return the operand stack at the end to observe the final value.
def execute(code): ip = 0 # The current instruction. op = [] # Operand stack. Holds NumObjs or FunObjs. cs = [] # Call stack. Holds CallFrames. # We initialize the call stack with call frame of main() # In our example, the main() contains all code in the file # like in Python. c = CallFrame(FunObj(0, "_unused"), {}, None) c.v = code.o cs.append(c) while True: match code.b[ip]: case (OC.HALT, _): return op case (OC.JF, off): ip += off case (OC.PUSH, val): op.append(NumObj(val)) ip += 1 case (OC.ADD, _): left = op.pop() right = op.pop() op.append(NumObj(left.n+right.n)) ip += 1 case (OC.CALL, _): f = op.pop() x = op.pop() c = CallFrame(f, { f.a : x }, ip+1) cs.append(c) ip = f.e case (OC.RET, _): ip = cs[-1].r cs.pop() case (OC.GET, name): op.append(cs[-1].v[name]) ip += 1
Finally, we execute the code:
print(execute(code))
[43]
Let us write the function:
fun f(x) is x+1 in fun g(y) is f(y+2) in g(42) end end
The corresponding bytecode is:
sample2 = [ (OC.JF, 5), (OC.GET, "x"), (OC.PUSH, 1), (OC.ADD, None), (OC.RET, None), (OC.JF, 7), (OC.GET, "y"), (OC.PUSH, 2), (OC.ADD, None), (OC.GET, "f"), (OC.CALL, None), (OC.RET, None), (OC.PUSH, 42), (OC.GET, "g"), (OC.CALL, None), (OC.HALT, None) ]
The new code is:
code2 = Code(sample2, { "f" : FunObj(1, "x"), "g" : FunObj(6, "y") })
The call to f() in g() can only find the function object for f() in main()'s stack frame. So we need to search backwards when executing GET. This is the only change required in the execute().
def execute2(code): ip = 0 # The current instruction. op = [] # Operand stack. Holds NumObjs or FunObjs. cs = [] # Call stack. Holds CallFrames. # We initialize the call stack with call frame of main() # In our example, the main() contains all code in the file # like in Python. c = CallFrame(FunObj(0, "_unused"), {}, None) c.v = code.o cs.append(c) while True: match code.b[ip]: case (OC.HALT, _): return op case (OC.JF, off): ip += off case (OC.PUSH, val): op.append(NumObj(val)) ip += 1 case (OC.ADD, _): left = op.pop() right = op.pop() op.append(NumObj(left.n+right.n)) ip += 1 case (OC.CALL, _): f = op.pop() x = op.pop() c = CallFrame(f, { f.a : x }, ip+1) cs.append(c) ip = f.e case (OC.RET, _): ip = cs[-1].r cs.pop() case (OC.GET, name): for i in range(len(cs)-1, -1, -1): if name in cs[i].v: op.append(cs[i].v[name]) break ip += 1
print(execute2(code2))
[45]
The above kind of variable lookup is why in Python the following is:
def foo(): global x y = 0 for i in range(10000): y += x return y
slower than:
def foo(): global x z = x y = 0 for i in range(10000): y += z return y
2.10. Aside: Discriminating unions
The datatype double + double with a discriminating union is not the same as double. For example, temperature is a double+double and it cannot be collapsed into a double as it will lose the information whether it is in celsius or fahrenheit.
struct temperature { enum { CELSIUS, FAHRENHEIT } kind; union { double c; double f; }; };
In C, the programmer is expected to only access c when kind is CELSIUS. This is not enforced by the compiler.
The following Python types encode the same information. Unlike C, you can only access c when temperature is celsius.
from dataclasses import dataclass class Temperature: pass @dataclass class Celsius(Temperature): c: float @dataclass class Fahrenheit(Temperature): f: float