Compilers (2023)
Welcome to CS327: Compilers. In this course, we will design and implement a compiler for a simple imperative language. We will aim for the following language features.
- Basic Types: numbers, booleans, strings.
- Compound Types: products, sums, arrays.
- Conditionals.
- Loops.
- Functions.
- Closures.
- Mutable variables.
- Exceptions.
We will aim to implement a tree-walking interpreter followed by designing a bytecode along with a source to bytecode compiler that implements some optimizations such as function inlining, loop unrolling, constant folding, and dead code elimination.
My code for this course will be hosted at the github repository for CS327: Compilers (2023).
Current status #
Complete implementation for a PL with conditionals, loops, nested functions, and typechecking with both tree-walking and bytecode generation.
Project #
This section lists the goals of the project. Everyone should try to complete the minimal (M) goals. The goals marked intermediate (I) and advanced (A) can be taken up once the minimal goals are achieved.
- M: A number type and arithmetic.
- I: Multiple number types such as fractions and integers. Quotient and division are different. Quotient has type (integer, integer) -> integer and division has type (fraction, fraction) -> fraction. An integer can be used wherever a fraction can be used.
- M: Let expressions.
- I: Parallel
let
(Seelet..and
in Ocaml). - M: A Boolean type, comparisons, and if-else.
- I: An explicit unary boolifying operator. In Perl
so x
wherex
has any type produces a Boolean value. For example, ifx
is a number, it is true when non-zero. Ifx
is a string, it is true when non-empty. - M: Mutable variables.
- I: Static type checking. The expression
(5>3) + 2
should be an error without evaluating anything. - A: Disallow mutable variables to change type. With the binding
let mut p = True in ...
, the variablep
should only be assigned boolean values. - M: Strings with concatenation and slicing.
- M: A
print
operation that prints values to screen. - M: loops.
- M: Functions
- M: Lists with operations
cons
,is-empty?
,head
,tail
. - I:
for
loop to iterate over lists. - I: Mutable arrays with indexing, appending, popping, concatenation, element assignment.
- I: Allow declaration of type of array. For example
let xs: Array[int] = [] in ...
should preventxs[0] ← 5/3
. - A: Step-by-step debugger for your programming language.
- A: User-defined types – records.
- A: First-class functions.
A showcase of various programming languages designed by students.
Meetings #
21-04-2023 #
Resources for mathematical foundations. Feedback.
19-04-2023 #
Static links for accessing variables in outer functions.
12-04-2023 #
Functions with parameters. Calling convention. Function preamble.
05-04-2023 #
Parameter-less functions. Call frame management. CALL and RETURN instructions.
31-03-2023 #
Frames and local variables.
29-03-2023 #
Code walk-through for bytecode generation. See ‘unstable’ branch in the git repo. Expressions without variables are done.
22-03-2023 #
Exam II
17-03-2023 #
Bytecode generation for conditionals and loops.
15-03-2023 #
Stack VM, Bytecode generation for arithmetic expressions.
03-03-2023 #
Importance of formalizing passes. Parser generators (GNU Bison, Python parsimonious). Programming language shootout game. Code golf.
01-03-2023 #
Arrays vs Lists performance. Implementing arrays. Referential transparency. Copy-on-Write.
First-class functions and closures. Using heap to implement closures.
24-02-2023 #
Implementing variable resolution for proper behaviour.
22-02-2023 #
Pitfalls of Python style extreme late binding of symbols.
15-02-2023 #
Implementing function definitions and function calls.
13-02-2023 #
Exam I sheets loaned. Discussed project evaluation.
10-02-2023 #
We discussed name visibility of functions in various languages. Late binding in Python, forward declarations in C, and explicit syntactic form for mutual recursion in OCaml.
04-02-2023 #
Exam I.
03-02-2023 #
We implemented an imperative environment that handles scoping of mutable variables.
01-02-2023 #
We discussed how to write environments for handling mutable variables correctly.
27-01-2023 #
We discussed operator precedence, associativity, and how to write a grammar that captures these concepts. We then saw how one can write a parser for such grammars.
25-01-2023 #
We discussed parsing. Recursive descent parsing peeks at the first token and determines how to parse the string based on that. We discussed ambiguity in expression parsing and how recursive descent fails for arithmetic expressions in infix form.
20-01-2023 #
We discussed lexing where we produce a stream of tokens from a stream of characters (also known as source code).
18-01-2023 #
We discussed static analysis – analysis of programs without evaluating them. A part of it is semantic analysis which involves type checking. We discussed how to perform typechecking by tree-walking. We discussed language design choices such as expression+statements or expression-only languages and statically-typed vs dynamically-typed languages.
16-01-2023 #
We discussed project submission workflow. All individual contributions SHOULD be associated with their github (or equivalent) id. As a rough guideline for project goals, try to implement solutions for the first five questions from Project Euler in the language you are implementing.
13-01-2023 #
We implemented let
expressions in our interpreter by keeping track of environments that map names into values.
11-01-2023 #
We learned lexical scoping and let
bindings that introduce lexically scoped bindings.
09-01-2023 #
We wrote code for an arithmetic expression tree evaluator. We discussed internal representation of programs – Abstract Syntax Trees (ASTs). We then defined an AST suitable for a calculator in Python and wrote a tree-walk eval()
for it.
04-01-2023 #
Attendance is not mandatory. Grading will be divided into theory exams (50) and team project (50). There will be three assessments for both and the split will be (15 + 15 + 20). Each team may have four to five members including a team leader. You can choose any programming language. The history of the project should be tracked via git and commits should identify contribution. Before assessment, each team has to submit information regarding work done by each member. This will be compared against git commits and performance in TA evaluations. Theory exams will mostly be subjective and will test your understanding of concepts.
You can learn about using git via this github tutorial. Each team member should have their own fork of their project where they make their commits. Individual changes can be pushed to a main repository that will aggregate all contributions into the group’s output.
Python dataclasses, Python match statement, and pytest will be extensively used by me during lectures. For your project, feel free to use your preferred tools.
References #
- Robert Nystrom’s excellent book
- Wikibook on PLs
- Rosetta code for sample programs in various languages
- Programming language benchmarks comparing performance of various languages
- Project Euler for simple programs to test your language
- Reflections on trusting trust
- Smashing the stack for fun and profit
Showcase #
This section showcases various programming languages designed by students. The programs are solutions to Project Euler problem 14.
dino #
assign start = 3;
assign count = 1;
assign ans = 1;
assign max_count = 1;
assign dp = {1:1};
func collatz(n)
?echo(n);
if(dp.in_dict(n))
return dp[n];
end
if(n%2 == 0)
dp[n] = collatz(n//2) + 1;
end
else
dp[n] = collatz(3*n + 1) + 1;
end
return dp[n];
end
loop(start < 100)
? echo(start);
count = collatz(start);
if(count > max_count)
max_count = count;
ans = start;
end
start = start + 2;
end
echo(max_count);
echo(ans);
gossip #
declare n = 1;
declare k = 100000;
declare len = 1;
declare i = 1;
declare max = 0;
declare num = -1;
declare d = dictn(0);
while i <= k do {
assign n = i;
assign len = 1;
while n > 1 do {
if d[n]!=0 then {
assign len = len + d[n] -1;
assign n = 1;
}
else {
if n % 2 == 0 then {
assign n = n / 2;
assign len = len + 1;}
else {
assign n = 3 * n + 1;
assign n = n/2;
assign len = len + 2;
};
};
};
assign d[i] = len;
if len > max then {
assign max = len;
assign num = i;}
assign i = i + 1;
};
print(k);
print(num);
print(max);
notpy #
var i = 2;
var a = [1000000];
var max=0;
var maxnum=0;
while (i<1000000){
var start = i;
var cnt =1;
while (start !=1){
if (start < 1000000){
if (a[start] != 0){
cnt = cnt + a[start]-1;
start = 1;
}
else{
if (start %2 == 0){
start = start/2;
}
else{
start = 3*start + 1;
}
cnt = cnt+1;
}
}
else{
if (start%2==0){
start=start/2;
}
else{
start=3*start+1;
}
cnt=cnt+1;
}
}
if(cnt>max){max=cnt; maxnum=i;}
a[i] = cnt;
print i, cnt;
i=i+1;
}
print "ans", max, maxnum;
thorium #
let ans=0 in
let start=0 in
let num = 1 in
{
while num<=100000 do
{
let l=1 in
let n= num in
{
while n>1 do
{
if n%2==0 then {
n=n//2;
};
else{
n=3*n+1;
}; end
l=l+1;
};
if l>ans then {
ans= l;
start=num;
};
else{
ans=ans;
}; end
};
num=num+ 1;
};
print ans;
print start;
}
end
Zebra #
var int l = 1000000;
var array(int) a=array(int){};
var int mem = 1000000;
for(var int i=0;i<mem;i=i+1){
append(0,a);
}
var int maxChainLength = 0;
var int result = 0;
var int n = nil;
var int chainLength = nil;
var int flag = 0;
for (var int i = 1; i < l; i=i+1) {
n = i;
chainLength = 1;
flag = 0;
while (n > 1)
{
if (n<mem){
if (a[n]!=0){
chainLength = chainLength + a[n] -1;
flag = 1;
n = 0;
}
}
if (flag ==0){
if (n % 2 == 0) {
n = n//2;
}
else {
n = 3 * n + 1;
}
chainLength = chainLength + 1;
}
}
if (chainLength > maxChainLength) {
maxChainLength = chainLength;
result = i;
}
if (i<mem) {a[i]=chainLength;}
}
zout("result", result);
zout("len", maxChainLength);
Zoro #
var mil <- 1000;
var lst <- [1];
var i <- 2;
var maxl <- 1;
var maxi <- 1;
var temp <- 0;
var j <- 0;
while i <= mil
do
if i%2 == 0
then
temp <- 1 + lst.at (i//2 - 1);
lst.push temp;
else
j <- 3*i + 1;
temp <- 1;
while j >= i
do
if j % 2 == 0
then
j <- j // 2;
else
j <- 3*j + 1;
endif;
temp <- temp + 1;
endwhile;
temp <- temp + lst.at (j - 1);
lst.push temp;
endif;
if temp > maxl
then
maxi <- i;
maxl <- temp;
endif;
i <- i+1;
endwhile;
print mil;
print maxi;
print maxl;