The expiration time for new job artifacts in CI/CD pipelines is now 30 days (GitLab default). Previously generated artifacts in already completed jobs will not be affected by the change. The latest artifacts for all jobs in the latest successful pipelines will be kept. More information: https://gitlab.lrz.de/help/user/admin_area/settings/continuous_integration.html#default-artifacts-expiration

README.md 6.14 KB
Newer Older
1
2
DHParser
========
Eckhart Arnold's avatar
Eckhart Arnold committed
3

4
5
A parser combinator based parsing and compiling infrastructure for domain
specific languages (DSL) in Digital Humanities projects.
Eckhart Arnold's avatar
Eckhart Arnold committed
6
7
8
9
10
11
12
13

Author: Eckhart Arnold, Bavarian Academy of Sciences
Email:  arnold@badw.de


License
-------

14
15
16
17
18
19
20
DHParser is open source software under the [MIT License](https://opensource.org/licenses/MIT)



Purpose
-------

21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
Domain specific languages are widespread in computer sciences, but
seem to be underused in the Digital Humanities. While DSLs are
sometimes introduced to Digital-Humanities-projects as
[practical adhoc-solution][Müller_2016], these solutions are often
somewhat "quick and dirty". In other words they are more of a hack
than a technology. The purpose of DHParser is to introduce
[DSLs as a technology][Arnold_2016] to the Digital Humanities. It is
based on the well known technology of [EBNF][ISO_IEC_14977]-based
parser generators, but employs the more modern form called
"[parsing expression grammar][Ford_2004]" and
[parser combinators][Ford_20XX] as a variant of the classical
recursive descent parser.

Why another parser generator? There are plenty of good parser
generators out there,
e.g. [Añez's grako parser generator][Añez_2017]. However, DHParser is
intended as a tool that is specifically geared towards digital
humanities applications, while most existing parser generators come
from compiler construction toolkits for programming languages. Also,
DHParser shall (in the future) serve as a teching tool, which
influences some of its design decisions such as, for example, clearly
separating the parsing, syntax-tree-transformation and compilation
stages. Also, DHParser is intended as a tool to experiment with.  One
possible research area is, how non
[context-free grammars](https://en.wikipedia.org/wiki/Context-free_grammar)
such as the grammars of [TeX][tex_stackexchange_no_bnf] or
[CommonMark][MacFarlane_et_al_2017] can be described with declarative
langauges in the spirit of but beyond EBNF, and what extensions of the
parsing technology are necessary to capture such languages.

Primary use case at the Bavarian Academy of Sciences and Humanities
(for the time being): A DSL for the
"[Mittellateinische Wörterbuch](http://www.mlw.badw.de/)"!
54
55
56

Further (intended) use cases are:

57
58
59
60
61
62
63
* LaTeX -> XML/HTML conversion. See this
  [discussion on why an EBNF-parser for the complete TeX/LaTeX-grammar][tex_stackexchange_no_bnf]
  is not possible.
* [CommonMark][MacFarlane_et_al_2017] and other DSLs for cross media
  publishing of scientific literature, e.g. journal articles.  (Common
  Mark and Markdown also go beyond what is feasible with pure
  EBNF-based-parsers.)
64
65
66
* EBNF itself. DHParser is already self-hosting ;-)
* Digital and cross-media editions 
* Digital dictionaries
Eckhart Arnold's avatar
Eckhart Arnold committed
67
68
69
70
71


Description
-----------

72
... comming soon ;-)
Eckhart Arnold's avatar
Eckhart Arnold committed
73

74
75
76
77
78
For a simple self-test run `dhparser.py` from the command line. This
compiles the EBNF-Grammer in `examples/EBNF/EBNF.ebnf` and outputs the
Python-based parser class representing that grammar. The concrete and
abstract syntax tree as well as a full and abbreviated log of the
parsing process will be stored in a sub-directory named "DEBUG".
Eckhart Arnold's avatar
Eckhart Arnold committed
79
80


di68kap's avatar
di68kap committed
81

82
83
84
References
----------

85
86
Juancarlo Añez: grako, a PEG parser generator in Python, 2017. URL:
[bitbucket.org/apalala/grako][Añez_2017]
87

di68kap's avatar
di68kap committed
88
[Añez_2017]: https://bitbucket.org/apalala/grako
89
90


91
92
93
94
Eckhart Arnold: Domänenspezifische Notationen. Eine (noch)
unterschätzte Technologie in den Digitalen Geisteswissenschaften,
Präsentation auf dem
[dhmuc-Workshop: Digitale Editionen und Auszeichnungssprachen](https://dhmuc.hypotheses.org/workshop-digitale-editionen-und-auszeichnungssprachen),
di68kap's avatar
di68kap committed
95
München 2016. Short-URL: [tiny.badw.de/2JVT][Arnold_2016]
96

di68kap's avatar
di68kap committed
97
[Arnold_2016]: https://f.hypotheses.org/wp-content/blogs.dir/1856/files/2016/12/EA_Pr%C3%A4sentation_Auszeichnungssprachen.pdf
98
99


100
101
102
Brian Ford: Parsing Expression Grammars: A Recognition-Based Syntactic
Foundation, Cambridge
Massachusetts, 2004. Short-URL:[http://t1p.de/jihs][Ford_2004]
103
104
105
106
107
108
109
110
111
112
113
114
115
116

[Ford_2004]: https://pdos.csail.mit.edu/~baford/packrat/popl04/peg-popl04.pdf
  
[Ford_20XX]: http://bford.info/packrat/ 


Richard A. Frost, Rahmatullah Hafiz and Paul Callaghan: Parser
Combinators for Ambiguous Left-Recursive Grammars, in: P. Hudak and
D.S. Warren (Eds.): PADL 2008, LNCS 4902, pp. 167–181, Springer-Verlag
Berlin Heidelberg 2008.


Dominikus Herzberg: Objekt-orientierte Parser-Kombinatoren in Python,
Blog-Post, September, 18th 2008 on denkspuren. gedanken, ideen,
117
118
anregungen und links rund um informatik-themen, short-URL:
[http://t1p.de/bm3k][Herzberg_2008a]
119
120
121
122
123
124

[Herzberg_2008a]: http://denkspuren.blogspot.de/2008/09/objekt-orientierte-parser-kombinatoren.html


Dominikus Herzberg: Eine einfache Grammatik für LaTeX, Blog-Post,
September, 18th 2008 on denkspuren. gedanken, ideen, anregungen und
125
126
links rund um informatik-themen, short-URL:
[http://t1p.de/7jzh][Herzberg_2008b]
127
128
129
130
131
132

[Herzberg_2008b]: http://denkspuren.blogspot.de/2008/09/eine-einfache-grammatik-fr-latex.html


Dominikus Herzberg: Uniform Syntax, Blog-Post, February, 27th 2007 on
denkspuren. gedanken, ideen, anregungen und links rund um
di68kap's avatar
di68kap committed
133
informatik-themen, short-URL: [http://t1p.de/s0zk][Herzberg_2007]
134
135
136
137
138
139
140

[Herzberg_2007]: http://denkspuren.blogspot.de/2007/02/uniform-syntax.html


[ISO_IEC_14977]: http://www.cl.cam.ac.uk/~mgk25/iso-14977.pdf


141
142
143
144
John MacFarlane, David Greenspan, Vicent Marti, Neil Williams,
Benjamin Dumke-von der Ehe, Jeff Atwood: CommonMark. A strongly
defined, highly compatible specification of
Markdown, 2017. [commonmark.org][MacFarlane_et_al_2017]
145
146
147
148

[MacFarlane_et_al_2017]: http://commonmark.org/


149
150
151
152
Stefan Müller: DSLs in den digitalen Geisteswissenschaften,
Präsentation auf dem
[dhmuc-Workshop: Digitale Editionen und Auszeichnungssprachen](https://dhmuc.hypotheses.org/workshop-digitale-editionen-und-auszeichnungssprachen),
München 2016. Short-URL: [tiny.badw.de/2JVy][Müller_2016]
153
154
155
156
157
158

[Müller_2016]: https://f.hypotheses.org/wp-content/blogs.dir/1856/files/2016/12/Mueller_Anzeichnung_10_Vortrag_M%C3%BCnchen.pdf


[tex_stackexchange_no_bnf]: http://tex.stackexchange.com/questions/4201/is-there-a-bnf-grammar-of-the-tex-language
 
di68kap's avatar
di68kap committed
159
[tex_stackexchange_latex_parsers]: http://tex.stackexchange.com/questions/4223/what-parsers-for-latex-mathematics-exist-outside-of-the-tex-engines