ReferenceManual.rst 5.88 KB
Newer Older
1
*************************
2
DHParser Reference Manual
3
4
5
6
7
8
9
10
11
12
13
14
15
*************************

This reference manual explains the technology used by DHParser. It is
intended for people who would like to extend or contribute to
DHParser. The reference manual does not explain how a Domain Specific
Language (DSL) is developed (see the User's Manual for that). It it
explains the technical approach that DHParser employs for parsing,
abstract syntax tree transformation and compilation of a given
DSL. And it describes the module and class structure of the DHParser
Software. The programming guide requires a working knowledge of Python
programming and a basic understanding or common parser technology from
the reader. Also, it is recommended to read the introduction and the
user's guide first.
16
17

Fundamentals
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
============

DHParser is a parser generator aimed at but not restricted to the
creation of domain specific languages in the Digital Humanities (DH),
hence the name "DHParser". In the Digital Humanities, DSLs allow to
enter annotated texts or data in a human friendly and readable form
with a Text-Editor. In contrast to the prevailing XML-approach, the
DSL-approach distinguishes between a human-friendly *editing data
format* and a maschine friendly *working data format* which can be XML
but does not need to be. Therefore, the DSL-approach requires an
additional step to reach the *working data format*, that is, the
compilation of the annotated text or data written in the DSL (editing
data format) to the working data format. In the following a text or
data file wirtten in a DSL will simply be called *document*. The
editing data format will also be called *source format* and the
working data format be denoted as *target format*.
34
35
36
37
38
39
40
41
42

Compiling a document specified in a domain specific language involves the following steps:

1. **Parsing** the document which results in a representation of the document as a concrete
   syntax tree.

2. **Transforming** the concrete syntax tree (CST) into an abstract syntax tree (AST), i.e. a
   streamlined and simplified syntax tree ready for compilation.

43
3.  **Compiling** the abstract syntax tree into the working data format.
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71

All of these steps a carried out be the computer without any user intervention, i.e. without the
need of humans to rewrite or enrich the data during any these steps. A DSL-compiler therefore
consists of three components which are applied in sequence, a *parser*, a *transformer* and a
*compiler*. Creating, i.e. programming these components is the task of compiler construction.
The creation of all of these components is supported by DHParser, albeit to a different degree:

1. *Creating a parser*: DHParser fully automizes parser generation. Once the syntax of the DSL
   is formally specified, it can be compiled into a python class that is able to parse any
   document written in the DSL. DHParser uses Parsing-Expression-Grammars in a variant of the
   Extended-Backus-Naur-Form (EBNF) for the specification of the syntax. (See
   `examples/EBNF/EBNF.ebnf` for an example.)

2. *Specifying the AST-transformations*: DHParser supports the AST-transformation with a
   depth-first tree traversal algorithm (see `DHParser.transform.traverse` ) and a number of
   stock transformation functions which can also be combined. Most of the AST-transformation is
   specified in a declarative manner by filling in a transformation-dictionary which associates
   the node-types of the concrete syntax tree with such combinations of transformations. See
   `DHParser.ebnf.EBNF_AST_transformation_table` as an example.

3. *Filling in the compiler class skeleton*: Compiler generation cannot be automated like parser
   generation. It is supported by DHParser merely by generating a skeleton of a compiler class
   with a method-stub for each definition (or "production" as the definition are sometimes also
   called) of the EBNF-specification. (See `examples/EBNF/EBNFCompiler.py`) If the target format
   is XML, there is a chance that the XML can simply be generated by serializing the abstract
   syntax tree as XML without the need of a dedicated compilation step.

Compiler Creation Workflow
72
==========================
73
74
75
76

TODO: Describe:
- setting up a new projekt
- invoking the DSL Compiler
77
- conventions and data types
78
79
- the flat namespace of DH Parser

80

81
Component Guide
82
===============
83

84
85
Parser
------
86
87
88
89
90
91
92
93
94
95
96
97
98
99

Parser-creation if supported by DHParser by an EBNF to Python compiler which yields a working
python class that parses any document the EBNF-specified DSL to a tree of Node-objects, which
are instances of the `class Node` defined in `DHParser/snytaxtree.py`

The EBNF to Python compiler is actually a DSL-compiler that has been crafted with DHParser
itself. It is located in `DHParser/enbf.py`. The formal specification of the EBNF variant
used by DHParser can be found in `examples/EBNF/EBNF.ebnf`. Comparing the automatically
generated `examples/EBNF/EBNFCompiler.py` with `DHParser/ebnf.py` can give you an idea what
additional work is needed to create a DSL-compiler from an autogenerated DSL-parser. In most
DH-projects this task will be less complex, however, as the target format is XML which
usually can be derived from the abstract syntax tree with fewer steps than the Python code in
the case of DHParser's EBNF to Python compiler.

100
101
AST-Transformation
------------------
102
103
104
105
106
107
108
109

Other than for the compiler generation (see the next point below), a functional rather than
object-oriented approach has been employed, because it allows for a more concise
specification of the AST-transformation since typically the same combination of
transformations can be used for several node types of the AST. It would therefore be tedious
to fill in a method for each of these. In a sense, the specification of AST-transformation
constitutes an "internal DSL" realized with the means of the Python language itself.

110
111
Compiler
--------
112
113
114


Module Structure of DHParser
115
============================
116

117
      
118
Class Hierarchy of DHParser
119
===========================