DHParser Reference Manual

This reference manual explains the technology used by DHParser. It is intended for people who would like to extend or contribute to DHParser. The reference manual does not explain how a Domain Specific Language (DSL) is developed (see the User’s Manual for that). It it explains the technical approach that DHParser employs for parsing, abstract syntax tree transformation and compilation of a given DSL. And it describes the module and class structure of the DHParser Software. The programming guide requires a working knowledge of Python programming and a basic understanding or common parser technology from the reader. Also, it is recommended to read the introduction and the user’s guide first.

Fundamentals

DHParser is a parser generator aimed at but not restricted to the creation of domain specific languages in the Digital Humanities (DH), hence the name “DHParser”. In the Digital Humanities, DSLs allow to enter annotated texts or data in a human friendly and readable form with a Text-Editor. In contrast to the prevailing XML-approach, the DSL-approach distinguishes between a human-friendly editing data format and a maschine friendly working data format which can be XML but does not need to be. Therefore, the DSL-approach requires an additional step to reach the working data format, that is, the compilation of the annotated text or data written in the DSL (editing data format) to the working data format. In the following a text or data file wirtten in a DSL will simply be called document. The editing data format will also be called source format and the working data format be denoted as target format.

Compiling a document specified in a domain specific language involves the following steps:

  1. Parsing the document which results in a representation of the document as a concrete syntax tree.
  2. Transforming the concrete syntax tree (CST) into an abstract syntax tree (AST), i.e. a streamlined and simplified syntax tree ready for compilation.
  3. Compiling the abstract syntax tree into the working data format.

All of these steps a carried out be the computer without any user intervention, i.e. without the need of humans to rewrite or enrich the data during any these steps. A DSL-compiler therefore consists of three components which are applied in sequence, a parser, a transformer and a compiler. Creating, i.e. programming these components is the task of compiler construction. The creation of all of these components is supported by DHParser, albeit to a different degree:

  1. Creating a parser: DHParser fully automizes parser generation. Once the syntax of the DSL is formally specified, it can be compiled into a python class that is able to parse any document written in the DSL. DHParser uses Parsing-Expression-Grammars in a variant of the Extended-Backus-Naur-Form (EBNF) for the specification of the syntax. (See examples/EBNF/EBNF.ebnf for an example.)
  2. Specifying the AST-transformations: DHParser supports the AST-transformation with a depth-first tree traversal algorithm (see DHParser.transform.traverse ) and a number of stock transformation functions which can also be combined. Most of the AST-transformation is specified in a declarative manner by filling in a transformation-dictionary which associates the node-types of the concrete syntax tree with such combinations of transformations. See DHParser.ebnf.EBNF_AST_transformation_table as an example.
  3. Filling in the compiler class skeleton: Compiler generation cannot be automated like parser generation. It is supported by DHParser merely by generating a skeleton of a compiler class with a method-stub for each definition (or “production” as the definition are sometimes also called) of the EBNF-specification. (See examples/EBNF/EBNFCompiler.py) If the target format is XML, there is a chance that the XML can simply be generated by serializing the abstract syntax tree as XML without the need of a dedicated compilation step.

Compiler Creation Workflow

TODO: Describe: - setting up a new projekt - invoking the DSL Compiler - conventions and data types - the flat namespace of DH Parser

Component Guide

Parser

Parser-creation if supported by DHParser by an EBNF to Python compiler which yields a working python class that parses any document the EBNF-specified DSL to a tree of Node-objects, which are instances of the class Node defined in DHParser/snytaxtree.py

The EBNF to Python compiler is actually a DSL-compiler that has been crafted with DHParser itself. It is located in DHParser/enbf.py. The formal specification of the EBNF variant used by DHParser can be found in examples/EBNF/EBNF.ebnf. Comparing the automatically generated examples/EBNF/EBNFCompiler.py with DHParser/ebnf.py can give you an idea what additional work is needed to create a DSL-compiler from an autogenerated DSL-parser. In most DH-projects this task will be less complex, however, as the target format is XML which usually can be derived from the abstract syntax tree with fewer steps than the Python code in the case of DHParser’s EBNF to Python compiler.

AST-Transformation

Other than for the compiler generation (see the next point below), a functional rather than object-oriented approach has been employed, because it allows for a more concise specification of the AST-transformation since typically the same combination of transformations can be used for several node types of the AST. It would therefore be tedious to fill in a method for each of these. In a sense, the specification of AST-transformation constitutes an “internal DSL” realized with the means of the Python language itself.

Compiler

Module Structure of DHParser

Class Hierarchy of DHParser