A toy language compiler written in Python. For educational purposes.
ToyC is a simple, educational compiler for a C‑inspired toy language, implemented in Python. It demonstrates the end‑to‑end process of parsing source code, building an abstract syntax tree (AST), generating intermediate representation (IR) using LLVM via llvmlite, and emitting executable binaries. ToyC is designed to help learners understand compiler construction fundamentals in a hands-on way.
- Parser and AST: ToyC includes a hand‑written parser that converts source code into an AST.
- LLVM IR Generation: Uses llvmlite to lower the AST into LLVM IR, including support for variables, arithmetic, conditionals (
if
/else
), labels/gotos, and functions. - Code Emission: Generates object files or executables by invoking LLVM’s code generation pipeline.
- Educational Focus: Clear, well‑documented code, with each compiler stage visible to the learner.
- Python 3.8 or newer
- llvmlite (for IR-level code generation)
- LLVM toolchain installed (e.g.,
clang
,llc
,llvm-as
,llvm-ld
) if emitting final binaries - poetry for Python package management
-
Clone the repository
git clone https://github.com/YourUsername/toyc.git cd toyc
-
Create a virtual environment (recommended)
poetry shell
-
Install Python dependencies
poetry install
ToyC’s syntax is loosely based on a subset of C. The current implementation supports:
-
Variable Declarations
let x: i32 = 10; let flag: bool = true;
-
Arithmetic and Boolean Operations
let sum: i32 = a + b; let result: bool = (x < 5) && (y >= 2);
-
Return Statements
return x;
-
If/Else
if x < 10 { x = x + 1; } else { x = x - 1; }
-
Labels & Goto
label loop: if x < 5 { x = x + 1; goto loop; }
-
Function Definition
fn add(a: i32, b: i32): i32 { return a + b; }
-
main
Function Entry Point
The compiler expects afn main(): i32 { ... }
entry point. The returned integer will be used as the program’s exit code.
ToyC’s main driver script is main.py
. To compile a .tc
source file:
poetry run python main.py path/to/source.tc
or to JIT:
poetry run python main.py path/to/source.tc --jit
This will:
- Parse
source.tc
into an AST. - Perform semantic checks (type validation, undefined identifiers, etc.).
- Lower the AST into LLVM IR using llvmlite.
- Either JIT or AOT compile the IR to an executable.
For example to see quit JIT reuslts:
poetry run python main.py examples/loop.tc --jit
Here are some examples of ToyC programs:
fn main(): i32 {
let a: i32 = 1;
return a;
}
Compiles, prints nothing, and returns exit code 1
.
fn factorial(n: i32): i32 {
if n <= 1 {
return 1;
}
let sub: i32 = n - 1;
let rec: i32 = factorial(sub);
return n * rec;
}
fn main(): i32 {
let result: i32 = factorial(5);
return result;
}
Computes 5! = 120
and returns 120
as exit code.
fn main(): i32 {
let i: i32 = 0;
label loop:
if i < 3 {
i = i + 1;
goto loop;
}
return i;
}
Demonstrates labels, conditional branches, and gotos.
ToyC/
├── codegen.py # IR lowering and LLVM builder logic
├── parser.py # Lexing and parsing into AST
├── ast.py # AST node definitions
├── main.py # Compiler driver (parsing → IR → emit)
├── examples/ # Sample .tc source files
└── README.md # This file
ast.py
: Defines classes for expressions, statements, function declarations, etc.parser.py
: Converts*.tc
text into an instance of the AST.codegen.py
: Lowers AST nodes into LLVM IR using llvmlite’sir
builder.main.py
: Entry point that ties all stages together and writes out the final IR.
This project is released under the Apache License 2.0. See LICENSE for details.