Tools for Compiler Construction
We are trying to create and implement
(and eventually also use) a new programming language for
the implementation of logical algorithms and computer algebra.
Logic is special because it requires tree structures that
can have many different forms. In my view,
these data structures reveal a weakness that all current
programming languages with static type checking have:
We don't know how to statically type check
union or partiality.
The problem of partiality exists in many applications,
but it shows up most clearly
in the implementation of logic.
All existing approaches (union, std::variant, inheritance, type casts,
std::optional, std::expected, use of nullptr,
use of special value like std::string::npos) are unsatisfactory,
because they cannot be statically type checked.
Checking union and partiality is currently impossible due to
unsolved theoretical questions, which I hope to solve.
Despite these unsolved theoretical problems,
the project resulted in techniques and software that
make it easy to implement logic in C++, so that in a broader
sense, the problem of efficiently implementing logic was solved.
Students have implemented proof checkers, and
are using the developed techniques.
The following software is available on this page:
- A parser generator for C++,
- a tokenizer generator for C++, and
- a generator for tree-like data types in C++.
The parser generator works like Yacc, but supports
C++ and RAII. It creates LALR parsers, and in addition
allows extending the language at run time, which
is needed for example by Prolog.
The tokenizer generator does not create a complete tokenizer,
but only something called classifier.
The classifier reads a part of the input
and determines to which type of symbol it belongs.
Generating a full tokenizer would result in a generator that
is too rigid to be useful. It is possible to extend the
generated classifier by hand. This is needed for non-regular
tokens, or for tokenizers that consider indentation.
The tree-type generator reads a set of specifications
of tree-like (algebraic) datatypes, and creates C++ implementations
of these datatypes.
The specifications are as concise as in a functional language
(if not more concise), while at the same time compiled
to efficient C++ code. The generated tree classes have
an API that is pleasant to use.
In addition to materials (code and documents)
directly related to the development of our programming language
and implementation of logic, this site also contains slide
sets about topics related to compiler construction.
The slides were written while teaching compiler construction
courses at
Instytut Informatyki Uniwersytetu
Wrocławskiego
and the
School of Engineering and Digital Sciences at Nazarbayev University.
Parser and Tokenizer Generation for C++ (Maphoon)
After having looked at existing parser generators, I concluded
that none of them fulfilled the needs of my project, so I
decided to write my own.
I tried to design it in such a way that it will be useful
for other compiler implementors as well.
The parser generator is called Maphoon.
In addition to Maphoon, I also created a tokenizer generator.
Both can be obtained from
here.
Tree Generator
TreeGen is a C++ program that automatically
creates C++ class definitions from definitions of recursive
data types. TreeGen is by itself written in C++.
It can be obtained from
here.
Turing Machine Simulator
Using Maphoon, we created a simulator for Turing machines.
It is fast, easy to use, and
it can handle non-deterministic Turing machines.
It can be obtained from here.
Presentation at C++ Now (2023)
I gave a presentation at
C++Now 2023
with title 'Trees for Logic and Parsing'.
The slides and the code fragments that were shown in the lecture are
here,
and
here is a recording of the presentation.
Note, however that the tree generator has been improved significantly
since the presentation.
Presentation at C++-now (2022)
I gave a lecture at
C++Now 2022 about
parsing, tokenizing and Maphoon.
The slides and the code fragments that were shown in the lecture are
here,
and
this is a recording of the presentation.
Slide Sets
Below are some slides that may be relevant for
teaching compiler construction or developing a compiler:
Attribute Grammars
Slides about attribute grammars.
Top-Down Parsing
One usually has to redesign the grammar in order to make it suitable
for top-down parsing. One must merge common prefixes in rules,
and remove left recursion.
Slides
Bottom-Up Parsing
Slides about bottom-up parsing.
Read these, if you want to understand the theory behind Maphoon.
Semantic Analysis
These slides
give an overview of type checking for C.
The slides are a bit simplified, but contain all the main
features of C, like the lval/rval distinction and the implicit
conversions from array to pointer.
Translation of C into Stack Machine
The first set of slides
introduces an untyped stack machine.
The second set of slides
adds types to the stack machine, and explains how to translate
C into the stack machine.
Lambda Calculus
In Fall 2022, I taught lambda calculus to master students.
Here
are slides about how to define inductive types in
lambda calculus, and
these are slides about the Hindley-Milner
type checking
algorithm. I think that the natural way to present the HM algorithm is by
using a clause based approach.
Author
This page was created by
Hans de Nivelle.
Acknowledgements
Thanks to
Danel Batyrbek, Witold Charatonik, Aleksandra Kireeva, Tatyana Korotkova,
Akhmetzhan Kussainov, Dina Muktubayeva, Cláudia Nalon, Olzhas Zhangeldinov.
Special thanks for Zoltan Teki for extensively testing Maphoon, and
requesting additional features. He found a bug in the tokenizer generator,
and thanks to him Maphoon now supports move-only attributes.
This research was sponsored by
Nazarbayev University through the Faculty Development Competitive
Research Grant Program (FDCRGP), through grant number 021220FD1651.