Writing a Compiler from Scratch: Lessons from Ravs
1 Feb 2025
4 min read
Akshansh Gusain
The Journey Begins
There’s something magical about writing code that processes code. When I started building Ravs, my custom programming language, I had a vague understanding of how compilers worked. Months later, I’ve gained a deep appreciation for the elegant engineering that powers every program we run.
This post shares key insights from the journey - the challenges, the “aha!” moments, and the design decisions that shaped Ravs.
Phase 1: Lexical Analysis
The lexer transforms raw source code into tokens. It sounds simple, but edge cases lurk everywhere:
let message = "Hello, \"World\"!" // String with escaped quotes
let ratio = 3.14e-10 // Scientific notation
let 日本語 = "unicode" // Unicode identifiers
My initial lexer was a massive switch statement. It worked, but adding new token types required touching dozens of places. The rewrite used a table-driven approach:
var tokenRules = []Rule{
{Pattern: `\d+\.?\d*`, Type: NUMBER},
{Pattern: `"(?:[^"\\]|\\.)*"`, Type: STRING},
{Pattern: `[a-zA-Z_]\w*`, Type: IDENTIFIER},
}
Lesson Learned
Test with weird input early. The nastiest bugs came from edge cases I didn’t consider initially - comments inside strings, newlines in unexpected places, and emoji in identifiers.
Phase 2: Parsing
Parsing transforms tokens into an Abstract Syntax Tree (AST). I chose a recursive descent parser for Ravs because it’s intuitive and easy to debug.
The tricky part? Operator precedence. Consider:
2 + 3 * 4 // Should be 2 + (3 * 4) = 14, not (2 + 3) * 4 = 20
I implemented Pratt parsing for expressions, which elegantly handles precedence by associating binding powers with operators:
func (p *Parser) parseExpression(precedence int) Expression {
left := p.parsePrimaryExpression()
for precedence < p.currentPrecedence() {
operator := p.advance()
right := p.parseExpression(p.rightPrecedence(operator))
left = &BinaryExpression{left, operator, right}
}
return left
}
Lesson Learned
Error messages matter more than you think. Initially, parse errors just said “unexpected token”. After struggling to debug my own programs, I invested in proper error recovery and helpful messages:
Error at line 12, column 15:
let x = 5 +
^
Expected expression after '+'
Phase 3: Semantic Analysis
The AST tells us what the code says, but not what it means. Semantic analysis validates:
- Variables are declared before use
- Type compatibility in expressions
- Function calls match their definitions
For Ravs, I implemented a simple type inference system. The type checker walks the AST, building a symbol table and inferring types where possible:
// Ravs infers the type from the right-hand side
let count = 0 // Inferred as int
let name = "Ada" // Inferred as string
let mixed = count + name // Type error!
Phase 4: Code Generation
Finally, we generate executable code. Ravs compiles to bytecode for a custom virtual machine. The VM is stack-based, similar to the JVM or Python’s CPython:
// Source: let x = 1 + 2
PUSH 1
PUSH 2
ADD
STORE x
Writing the VM was surprisingly fun. Each opcode is just a case in a big switch statement, and watching your language execute real programs for the first time is incredibly satisfying.
What’s Next for Ravs
The core language works, but there’s always more to explore:
- Garbage Collection: Currently using reference counting, exploring mark-and-sweep
- Standard Library: Building out common data structures and I/O
- Optimizations: Constant folding, dead code elimination
Building a compiler taught me more about programming than any other project. If you’re curious about language internals, I highly recommend trying it yourself. Start small, iterate often, and embrace the complexity.
Check out Ravs on GitHub to see the code!