Maphoon

Maphoon is a C++ based parser generator. It is written in C++, and it creates parsers in C++. It consists of two parts, a tokenizer generator and a parser generator. The parser generator works like Yacc, but supports C++ and RAII. It creates LALR parsers, and in addition allows extending the language at run time, which is needed for example by Prolog.

The tokenizer generator does not create a complete tokenizer, but something called a classifier. The classifier reads part of the input, and determines to which symbol type it belongs. Generating a full tokenizer would result in a tokenizer that is too rigid to be useful. The generated classifier can be extended by hand. This is useful for non-regular tokens, or for tokenizers that need to consider indentation.

Design Goals of the Parser Generator

The parser generator has the following design goals:
  1. The constructed parser is bottom-up. Bottom-up parsing is theoretically and practically superior to top-down parsing. There is usually no need to adapt the grammar, and attribute computation can be specified directly with the grammar rules.
  2. It is possible to use C++17 in action code. Concretely, it is possible to use attribute types with resource invariants (copy constructor, copying assignment, destructors) without effort. (Think of STL-containers, smart pointers, etc.) If all attribute types are movable, the parser will take advantage of this. It is possible to use move-only attributes, i.e. attributes that cannot be copied.
  3. The parser generator and the constructed parsers use portable C++ without need for additional libraries.
  4. It is easy to maintain source information in symbols for the generation of error messages.
  5. It is possible to define operators at run time. This is useful if one wants to implement a parser for Prolog. Even if one does not want to extend the language at runtime, using the mechanism often results in simpler parsers that can be more easily modified. The same mechanism can be used for defining context-sensitive key words.
  6. It is possible to define meaningful error messages. Defining error messages is based on regular expressions that trigger expectations. The error messages explain what was expected, and what was obtained instead.
While designing Maphoon, we took CUP as main example of a nice parser generator. We wanted the same in C++ and added a few extra features.

Design Goals of the Tokenizer Generator

The tokenizer generator has the following design goals:
  1. The constructed tokenizer is flexible. Our experience with previous tokenizer generators is that they are borderline useful. Main problem is that the constructed tokenizers are too rigid. Often there are a few missing features that one wants to add but cannot, the language may have a few non-regular symbols, and the user has no control over the way source information is stored in tokens. In order to allow flexibility, we don't generate a complete tokenizer, but only a function that reads input into a buffer and reports the symbol type that it belongs too. The user can add other functions that read non-regular symbols into the buffer, or define a function that reads whitespace without storing it in the buffer.
  2. It is easy to start using it. In order to obtain this, pairs of regular expressions and their classifications are initially written in code. A collection of such pairs is stored in a container called classifier. Using the classifier, a function called readandclassify( ) is called, which matches the input to the regular expressions. Function readandclassify( ) returns a pair consisting of the number of read characters together with the classification. After that, one can do a case analysis on the classification.
  3. It is possible to construct efficient tokenizers. Building a classifier in code is easy, but not efficient, because regular expressions are translated into automata every time the classifier is constructed. In order to obtain efficient tokenizers, it is possible to print the classifier as C++ code, and use the C++ code. This can be done when the classifier is finished. The resulting tokenizer is efficient. The idea of directly generating code instead of tables was taken from re2c.

Download

Both the tokenizer and the parser generator are published under the Aladdin Free Public License

The sources of the lexer generator are here, and the manual is here.

The sources of the parser generator are here, and the manual is here. In order to compile the parser generator, edit the Makefile and set Lexing to the directory containing directory lexing2023. Set Maphoon to the directory containing maphoon2024. If you want to run the multimodal logic example, you will also need need TVM and utilities. In order to run, edit the Makefiles to set the directories and type 'make'. The zip-file contains the following examples: