Tools for Compiler Construction

We are trying to create and implement (and eventually also use) a new programming language for the implementation of logical algorithms and computer algebra. Logic is special because it requires tree structures that can have many different forms. In my view, these data structures reveal a weakness that all current programming languages with static type checking have: We don't know how to statically type check union or partiality. The problem of partiality exists in many applications, but it shows up most clearly in the implementation of logic.

All existing approaches (union, std::variant, inheritance, type casts, std::optional, std::expected, use of nullptr, use of special value like std::string::npos) are unsatisfactory, because they cannot be statically type checked.

Checking union and partiality is currently impossible due to unsolved theoretical questions, which I hope to solve. Despite these unsolved theoretical problems, the project resulted in techniques and software that make it easy to implement logic in C++, so that in a broader sense, the problem of efficiently implementing logic was solved. Students have implemented proof checkers, and are using the developed techniques.

The following software is available on this page:

The parser generator works like Yacc, but supports C++ and RAII. It creates LALR parsers, and in addition allows extending the language at run time, which is needed for example by Prolog.

The tokenizer generator does not create a complete tokenizer, but only something called classifier. The classifier reads a part of the input and determines to which type of symbol it belongs. Generating a full tokenizer would result in a generator that is too rigid to be useful. It is possible to extend the generated classifier by hand. This is needed for non-regular tokens, or for tokenizers that consider indentation.

The tree-type generator reads a set of specifications of tree-like (algebraic) datatypes, and creates C++ implementations of these datatypes. The specifications are as concise as in a functional language (if not more concise), while at the same time compiled to efficient C++ code. The generated tree classes have an API that is pleasant to use.

In addition to materials (code and documents) directly related to the development of our programming language and implementation of logic, this site also contains slide sets about topics related to compiler construction. The slides were written while teaching compiler construction courses at Instytut Informatyki Uniwersytetu Wrocławskiego and the School of Engineering and Digital Sciences at Nazarbayev University.

Parser and Tokenizer Generation for C++ (Maphoon)

After having looked at existing parser generators, I concluded that none of them fulfilled the needs of my project, so I decided to write my own. I tried to design it in such a way that it will be useful for other compiler implementors as well. The parser generator is called Maphoon. In addition to Maphoon, I also created a tokenizer generator. Both can be obtained from here.

Tree Generator

TreeGen is a C++ program that automatically creates C++ class definitions from definitions of recursive data types. TreeGen is by itself written in C++. It can be obtained from here.

Turing Machine Simulator

Using Maphoon, we created a simulator for Turing machines. It is fast, easy to use, and it can handle non-deterministic Turing machines. It can be obtained from here.

Presentation at C++ Now (2023)

I gave a presentation at C++Now 2023 with title 'Trees for Logic and Parsing'. The slides and the code fragments that were shown in the lecture are here, and here is a recording of the presentation. Note, however that the tree generator has been improved significantly since the presentation.

Presentation at C++-now (2022)

I gave a lecture at C++Now 2022 about parsing, tokenizing and Maphoon. The slides and the code fragments that were shown in the lecture are here, and this is a recording of the presentation.

Slide Sets

Below are some slides that may be relevant for teaching compiler construction or developing a compiler:

Attribute Grammars

Slides about attribute grammars.

Top-Down Parsing

One usually has to redesign the grammar in order to make it suitable for top-down parsing. One must merge common prefixes in rules, and remove left recursion. Slides

Bottom-Up Parsing

Slides about bottom-up parsing. Read these, if you want to understand the theory behind Maphoon.

Semantic Analysis

These slides give an overview of type checking for C. The slides are a bit simplified, but contain all the main features of C, like the lval/rval distinction and the implicit conversions from array to pointer.

Translation of C into Stack Machine

The first set of slides introduces an untyped stack machine. The second set of slides adds types to the stack machine, and explains how to translate C into the stack machine.

Lambda Calculus

In Fall 2022, I taught lambda calculus to master students. Here are slides about how to define inductive types in lambda calculus, and these are slides about the Hindley-Milner type checking algorithm. I think that the natural way to present the HM algorithm is by using a clause based approach.

Author

This page was created by Hans de Nivelle.

Acknowledgements

Thanks to Danel Batyrbek, Witold Charatonik, Aleksandra Kireeva, Tatyana Korotkova, Akhmetzhan Kussainov, Dina Muktubayeva, Cláudia Nalon, Olzhas Zhangeldinov.

Special thanks for Zoltan Teki for extensively testing Maphoon, and requesting additional features. He found a bug in the tokenizer generator, and thanks to him Maphoon now supports move-only attributes.

This research was sponsored by Nazarbayev University through the Faculty Development Competitive Research Grant Program (FDCRGP), through grant number 021220FD1651.