Compilers
Table of Contents
- 1. Current status
- 2. Project
- 3. Meetings
- 3.1. 21-04-2023
- 3.2. 19-04-2023
- 3.3. 12-04-2023
- 3.4. 05-04-2023
- 3.5. 31-03-2023
- 3.6. 29-03-2023
- 3.7. 22-03-2023
- 3.8. 17-03-2023
- 3.9. 15-03-2023
- 3.10. 03-03-2023
- 3.11. 01-03-2023
- 3.12. 24-02-2023
- 3.13. 22-02-2023
- 3.14. 15-02-2023
- 3.15. 13-02-2023
- 3.16. 10-02-2023
- 3.17. 04-02-2023
- 3.18. 03-02-2023
- 3.19. 01-02-2023
- 3.20. 27-01-2023
- 3.21. 25-01-2023
- 3.22. 20-01-2023
- 3.23. 18-01-2023
- 3.24. 16-01-2023
- 3.25. 13-01-2023
- 3.26. 11-01-2023
- 3.27. 09-01-2023
- 3.28. 04-01-2023
- 4. References
- 5. Showcase
Welcome to CS327: Compilers. In this course, we will design and implement a compiler for a simple imperative language. We will aim for the following language features.
- Basic Types: numbers, booleans, strings.
- Compound Types: products, sums, arrays.
- Conditionals.
- Loops.
- Functions.
- Closures.
- Mutable variables.
- Exceptions.
We will aim to implement a tree-walking interpreter followed by designing a bytecode along with a source to bytecode compiler that implements some optimizations such as function inlining, loop unrolling, constant folding, and dead code elimination.
My code for this course will be hosted at the github repository for CS327: Compilers (2023).
1. Current status
Complete implementation for a PL with conditionals, loops, nested functions, and typechecking with both tree-walking and bytecode generation.
2. Project
This section lists the goals of the project. Everyone should try to complete the minimal (M) goals. The goals marked intermediate (I) and advanced (A) can be taken up once the minimal goals are achieved.
- M: A number type and arithmetic.
- I: Multiple number types such as fractions and integers. Quotient and division are different. Quotient has type (integer, integer) -> integer and division has type (fraction, fraction) -> fraction. An integer can be used wherever a fraction can be used.
- M: Let expressions.
- I: Parallel
let
(Seelet..and
in Ocaml). - M: A Boolean type, comparisons, and if-else.
- I: An explicit unary boolifying operator. In Perl
so x
wherex
has any type produces a Boolean value. For example, ifx
is a number, it is true when non-zero. Ifx
is a string, it is true when non-empty. - M: Mutable variables.
- A: Disallow mutable variables to change type. With the binding
let mut p = True in ...
, the variablep
should only be assigned boolean values. - I: Static type checking. The expression
(5>3) + 2
should be an error without evaluating anything. - M: A
print
operation that prints values to screen. - M: loops.
- M: Strings with concatenation and slicing.
- M: Functions
- M: Lists with operations
cons
,is-empty?
,head
,tail
. - I:
for
loop to iterate over lists. - I: Mutable arrays with indexing, appending, popping, concatenation, element assignment.
- I: Allow declaration of type of array. For example
let xs: Array[int] = [] in ...
should preventxs[0] ← 5/3
. - A: Step-by-step debugger for your programming language.
- A: User-defined types – records.
- A: First-class functions.
A showcase of various programming languages designed by students.
3. Meetings
3.1. 21-04-2023
Resources for mathematical foundations. Feedback.
3.2. 19-04-2023
Static links for accessing variables in outer functions.
3.3. 12-04-2023
Functions with parameters. Calling convention. Function preamble.
3.4. 05-04-2023
Parameter-less functions. Call frame management. CALL and RETURN instructions.
3.5. 31-03-2023
Frames and local variables.
3.6. 29-03-2023
Code walk-through for bytecode generation. See 'unstable' branch in the git repo. Expressions without variables are done.
3.7. 22-03-2023
Exam II
3.8. 17-03-2023
Bytecode generation for conditionals and loops.
3.9. 15-03-2023
Stack VM, Bytecode generation for arithmetic expressions.
3.10. 03-03-2023
Importance of formalizing passes. Parser generators (GNU Bison, Python parsimonious). Programming language shootout game. Code golf.
3.11. 01-03-2023
Arrays vs Lists performance. Implementing arrays. Referential transparency. Copy-on-Write.
First-class functions and closures. Using heap to implement closures.
3.12. 24-02-2023
Implementing variable resolution for proper behaviour.
3.13. 22-02-2023
Pitfalls of Python style extreme late binding of symbols.
3.14. 15-02-2023
Implementing function definitions and function calls.
3.15. 13-02-2023
Exam I sheets loaned. Discussed project evaluation.
3.16. 10-02-2023
We discussed name visibility of functions in various languages. Late binding in Python, forward declarations in C, and explicit syntactic form for mutual recursion in OCaml.
3.17. 04-02-2023
Exam I.
3.18. 03-02-2023
We implemented an imperative environment that handles scoping of mutable variables.
3.19. 01-02-2023
We discussed how to write environments for handling mutable variables correctly.
3.20. 27-01-2023
We discussed operator precedence, associativity, and how to write a grammar that captures these concepts. We then saw how one can write a parser for such grammars.
3.21. 25-01-2023
We discussed parsing. Recursive descent parsing peeks at the first token and determines how to parse the string based on that. We discussed ambiguity in expression parsing and how recursive descent fails for arithmetic expressions in infix form.
3.22. 20-01-2023
We discussed lexing where we produce a stream of tokens from a stream of characters (also known as source code).
3.23. 18-01-2023
We discussed static analysis – analysis of programs without evaluating them. A part of it is semantic analysis which involves type checking. We discussed how to perform typechecking by tree-walking. We discussed language design choices such as expression+statements or expression-only languages and statically-typed vs dynamically-typed languages.
3.24. 16-01-2023
We discussed project submission workflow. All individual contributions SHOULD be associated with their github (or equivalent) id. As a rough guideline for project goals, try to implement solutions for the first five questions from Project Euler in the language you are implementing.
3.25. 13-01-2023
We implemented let
expressions in our interpreter by keeping track
of environments that map names into values.
3.26. 11-01-2023
We learned lexical scoping and let
bindings that introduce lexically
scoped bindings.
3.27. 09-01-2023
We wrote code for an arithmetic expression tree evaluator. We
discussed internal representation of programs – Abstract Syntax Trees
(ASTs). We then defined an AST suitable for a calculator in Python and
wrote a tree-walk eval()
for it.
3.28. 04-01-2023
Attendance is not mandatory. Grading will be divided into theory exams (50) and team project (50). There will be three assessments for both and the split will be (15 + 15 + 20). Each team may have four to five members including a team leader. You can choose any programming language. The history of the project should be tracked via git and commits should identify contribution. Before assessment, each team has to submit information regarding work done by each member. This will be compared against git commits and performance in TA evaluations. Theory exams will mostly be subjective and will test your understanding of concepts.
You can learn about using git via this github tutorial. Each team member should have their own fork of their project where they make their commits. Individual changes can be pushed to a main repository that will aggregate all contributions into the group's output.
Python dataclasses, Python match statement, and pytest will be extensively used by me during lectures. For your project, feel free to use your preferred tools.
4. References
- Robert Nystrom's excellent book
- Wikibook on PLs
- Rosetta code for sample programs in various languages
- Programming language benchmarks comparing performance of various languages
- Project Euler for simple programs to test your language
- Reflections on trusting trust
- Smashing the stack for fun and profit
5. Showcase
This section showcases various programming languages designed by students. The programs are solutions to Project Euler problem 14.
5.1. dino
assign start = 3; assign count = 1; assign ans = 1; assign max_count = 1; assign dp = {1:1}; func collatz(n) ?echo(n); if(dp.in_dict(n)) return dp[n]; end if(n%2 == 0) dp[n] = collatz(n//2) + 1; end else dp[n] = collatz(3*n + 1) + 1; end return dp[n]; end loop(start < 100) ? echo(start); count = collatz(start); if(count > max_count) max_count = count; ans = start; end start = start + 2; end echo(max_count); echo(ans);
5.2. gossip
declare n = 1; declare k = 100000; declare len = 1; declare i = 1; declare max = 0; declare num = -1; declare d = dictn(0); while i <= k do { assign n = i; assign len = 1; while n > 1 do { if d[n]!=0 then { assign len = len + d[n] -1; assign n = 1; } else { if n % 2 == 0 then { assign n = n / 2; assign len = len + 1;} else { assign n = 3 * n + 1; assign n = n/2; assign len = len + 2; }; }; }; assign d[i] = len; if len > max then { assign max = len; assign num = i;} assign i = i + 1; }; print(k); print(num); print(max);
5.3. notpy
var i = 2; var a = [1000000]; var max=0; var maxnum=0; while (i<1000000){ var start = i; var cnt =1; while (start !=1){ if (start < 1000000){ if (a[start] != 0){ cnt = cnt + a[start]-1; start = 1; } else{ if (start %2 == 0){ start = start/2; } else{ start = 3*start + 1; } cnt = cnt+1; } } else{ if (start%2==0){ start=start/2; } else{ start=3*start+1; } cnt=cnt+1; } } if(cnt>max){max=cnt; maxnum=i;} a[i] = cnt; print i, cnt; i=i+1; } print "ans", max, maxnum;
5.4. thorium
let ans=0 in let start=0 in let num = 1 in { while num<=100000 do { let l=1 in let n= num in { while n>1 do { if n%2==0 then { n=n//2; }; else{ n=3*n+1; }; end l=l+1; }; if l>ans then { ans= l; start=num; }; else{ ans=ans; }; end }; num=num+ 1; }; print ans; print start; } end
5.5. Zebra
var int l = 1000000; var array(int) a=array(int){}; var int mem = 1000000; for(var int i=0;i<mem;i=i+1){ append(0,a); } var int maxChainLength = 0; var int result = 0; var int n = nil; var int chainLength = nil; var int flag = 0; for (var int i = 1; i < l; i=i+1) { n = i; chainLength = 1; flag = 0; while (n > 1) { if (n<mem){ if (a[n]!=0){ chainLength = chainLength + a[n] -1; flag = 1; n = 0; } } if (flag ==0){ if (n % 2 == 0) { n = n//2; } else { n = 3 * n + 1; } chainLength = chainLength + 1; } } if (chainLength > maxChainLength) { maxChainLength = chainLength; result = i; } if (i<mem) {a[i]=chainLength;} } zout("result", result); zout("len", maxChainLength);
5.6. Zoro
var mil <- 1000; var lst <- [1]; var i <- 2; var maxl <- 1; var maxi <- 1; var temp <- 0; var j <- 0; while i <= mil do if i%2 == 0 then temp <- 1 + lst.at (i//2 - 1); lst.push temp; else j <- 3*i + 1; temp <- 1; while j >= i do if j % 2 == 0 then j <- j // 2; else j <- 3*j + 1; endif; temp <- temp + 1; endwhile; temp <- temp + lst.at (j - 1); lst.push temp; endif; if temp > maxl then maxi <- i; maxl <- temp; endif; i <- i+1; endwhile; print mil; print maxi; print maxl;