Notice to GitKraken users: A vulnerability has been found in the SSH key generation of GitKraken versions 7.6.0 to 8.0.0 (https://www.gitkraken.com/blog/weak-ssh-key-fix). If you use GitKraken and have generated a SSH key using one of these versions, please remove it both from your local workstation and from your LRZ GitLab profile.

21.10.2021, 9:00 - 11:00: Due to updates GitLab may be unavailable for some minutes between 09:00 and 11:00.

Commit 5aa3acd2 authored by eckhart's avatar eckhart
Browse files

some documentation added (still a stub)

parent c90818b7
......@@ -145,7 +145,8 @@ class EBNFGrammar(Grammar):
def grammar_changed(grammar_class, grammar_source: str) -> bool:
"""Returns ``True`` if ``grammar_class`` does not reflect the latest
"""
Returns ``True`` if ``grammar_class`` does not reflect the latest
changes of ``grammar_source``
Parameters:
......
......@@ -583,7 +583,7 @@ class Node(collections.abc.Sized):
def log(self, log_file_name):
"""
Writes ab S-expressions of the tree with root `self` to a file.
Writes an S-expression-representation of the tree with root `self` to a file.
"""
if is_logging():
path = os.path.join(log_dir(), log_file_name)
......
......@@ -17,4 +17,4 @@ permissions and limitations under the License.
"""
__all__ = ('__version__',)
__version__ = '0.7.8' # + '_dev' + str(os.stat(__file__).st_mtime)
__version__ = '0.7.9' # + '_dev' + str(os.stat(__file__).st_mtime)
......@@ -5,9 +5,8 @@ Introduction to [DHParser](https://gitlab.lrz.de/badw-it/DHParser)
Motto: **Computers enjoy XML, humans don't.**
Why use domain specific languages in the humanities?
----------------------------------------------------
Why use domain specific languages in the humanities
---------------------------------------------------
Suppose you are a literary scientist and you would like to edit a poem
like Heinrich Heine's "Lyrisches Intermezzo". Usually, the technology of
......@@ -47,9 +46,9 @@ Now, while you might think that this all works well enough, there are
a few drawbacks to this approach:
- The syntax is cumbersome and the encoding not very legible to humans
working with it. (And I did not even use
working with it. (And I did not even use
[TEI-XML](http://www.tei-c.org/index.xml), yet...)
Editing and revising XML-encoded text is a pain. Just ask the
Editing and revising XML-encoded text is a pain. Just ask the
literary scientists who have to work with it.
- The XML encoding, especially TEI-XML, is often not intuitive. Only
......@@ -57,15 +56,15 @@ a few drawbacks to this approach:
friend, who is not into digital technologies, might help you with
proof-reading, you better think about it again.
- There is an awful lot of typing to do: All those lengthy opening
- There is an awful lot of typing to do: All those lengthy opening
and closing tags. This takes time...
- While looking for a good XML-Editor, you find that there hardly exist
any XML-Editors any more. (And for a reason, actually...) In
any XML-Editors any more. (And for a reason, actually...) In
particular, there are no good open source XML-Editors.
On the other hand, there are good reasons why XML is used in the
humanities: Important encoding standards like
humanities: Important encoding standards like
[TEI-XML](http://www.tei-c.org/index.xml) are defined in
XML. Its strict syntax and the possibility to check data against
schema help to detect and avoiding encoding errors. If the schema is
......@@ -81,29 +80,29 @@ provides an infrastructure that - if you know a little
Python-programming - makes it very easy to convert your annotated text
into an XML-encoding of your choice. With DHParser, the same poem above
can be simply encoded like this:
Heinrich Heine <gnd:118548018>,
Buch der Lieder <urn:nbn:de:kobv:b4-200905192211>,
Hamburg <gnd:4023118-5>, 1927.
Lyrisches Intermezzo
IV.
Wenn ich in deine Augen seh',
so schwindet all' mein Leid und Weh!
Doch wenn ich küsse deinen Mund,
so werd' ich ganz und gar gesund.
Wenn ich mich lehn' an deine Brust,
kommt's über mich wie Himmelslust,
doch wenn du sprichst: Ich liebe dich!
so muß ich weinen bitterlich.
Yes, that's right. It is as simple as that. Observe, how much more
effacious a verse like "Wenn ich mich lehn' an deine Brust, / kommt's
efficacious a verse like "Wenn ich mich lehn' an deine Brust, / kommt's
über mich wie Himmelslust," can be if it is not cluttered with XML tags
;-)
;-)
You might now wonder whether the second version really does encode the
same information as the XML version. How, for example, would the
......@@ -114,13 +113,13 @@ example, a verse always starts and ends on the same line. There
is always a gap between stanzas. And the title is always written above
the poem and not in the middle of it. So, if there is a title at all, we
can be sure that what is written in the first line is the title and not
a stanza.
a stanza.
DHParser is able to exploit all those hints in order to gather much the
same information as was encoded in the XML-Version. Don't believe it?
You can try: Download DHParser from the
You can try: Download DHParser from the
[gitlab-repository](https://gitlab.lrz.de/badw-it/DHParser) and enter
the directory `examples/Tutorial` on the command line interface (shell).
the directory `examples/Tutorial` on the command line interface (shell).
Just run `python LyrikCompiler_example.py` (you need to have installed
[Python](https://www.python.org/) Version 3.4 or higher on your computer).
The output will be something like this:
......@@ -165,36 +164,36 @@ without further proof that it can easily be converted into the other
version and contains all the information that the other version contains.
How does DHParser achieve this? Well, there is the rub. In order to convert
the poem in the domain specific version into the XML-version, DHParser
the poem in the domain specific version into the XML-version, DHParser
requires a structural description of the domain specific encoding. This
is a bit similar to a document type definition (DTD) in XML. This
structural description uses a slightly enhanced version of the
is a bit similar to a document type definition (DTD) in XML. This
structural description uses a slightly enhanced version of the
[Extended-Backus-Naur-Form (EBNF)](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form),
which is a well-established formalism for the structural description of
which is a well-established formalism for the structural description of
formal languages in computer sciences. An excerpt of the EBNF-definition
of our domain-specific encoding for the poem looks like this. (We leave out
the meta-data here. See
the meta-data here. See
[`examples/Tutorial/Lyrik.ebnf`](https://gitlab.lrz.de/badw-it/DHParser/blob/master/examples/Tutorial/Lyrik.ebnf)
for the full EBNF):
gedicht = { LEERZEILE }+ [serie] §titel §text /\s*/ §ENDE
serie = !(titel vers NZ vers) { NZ zeile }+ { LEERZEILE }+
serie = !(titel vers NZ vers) { NZ zeile }+ { LEERZEILE }+
titel = { NZ zeile}+ { LEERZEILE }+
zeile = { ZEICHENFOLGE }+
text = { strophe {LEERZEILE} }+
strophe = { NZ vers }+
vers = { ZEICHENFOLGE }+
ZEICHENFOLGE = /[^ \n<>]+/~
NZ = /\n/~
LEERZEILE = /\n[ \t]*(?=\n)/~
ENDE = !/./
Without going into too much detail here, let me just explain a few basics of
Without going into too much detail here, let me just explain a few basics of
this formal description: The slashes `/` enclose ordinary regular expressions.
Thus, `NZ` for ("Neue Zeile", German for: "new line") is defined as `/\n/~` which
is the newline-token `\n` in a regular expression, plus further horizontal
is the newline-token `\n` in a regular expression, plus further horizontal
whitespace (signified by the tilde `~`), if there is any.
The braces `{` `}` enclose items that can be repeated zero or more times; with a
......@@ -203,10 +202,10 @@ definition of `text` in the 6th line: `{ strophe {LEERZEILE} }+`. This reads as
follows: The text of the poem consists of a sequence of stanzas, each of which
is followed by a sequence of empty lines (German: "Leerzeilen"). If you now look
at the structural definition of a stanza, you find that it consists of a sequence
of verses, each of which starts, i.e. is preceeded by a new line.
of verses, each of which starts, i.e. is preceded by a new line.
Can you figure out the rest? Hint: The angular brackets `[` and `]` mean that and
item is optional and the `§` sign means that it is obligatory. (Strictly speaking,
item is optional and the `§` sign means that it is obligatory. (Strictly speaking,
the §-signs are not necessary, because an item that is not optional is always
obligatory, but the §-signs help the converter to produce more useful error
messages.)
......@@ -215,7 +214,7 @@ This should be enough for an introduction to the purpose of DSLs in the
humanities. It has shown the probably most important use case of
DHParser, i.e. as a frontend-technology form XML-encodings. Of course,
it can just as well be used as a frontend for any other kind of
structured data, like SQL or graph-strcutured data. The latter is by the
structured data, like SQL or graph-structured data. The latter is by the
way is a very reasonable alternative to XML for edition projects with a
complex transmission history. See Andreas Kuczera's Blog-entry on
["Graphdatenbanken für Historiker"](http://mittelalter.hypotheses.org/5995).
......@@ -237,19 +236,19 @@ Now, if you enter the repo, you'll find three subdirectories:
DHParser
examples
test
The directory `DHParser` contains the Python modules of the
DHParser-package, `test` - as you can guess - contains the unit-tests
for DHParser. Now, enter the `examples/Tutorial`-directory. Presently,
most other examples are pretty rudimentary. So, don't worry about them.
In this directory, you'll find a simple EBNF Grammar for poetry in the
file `Lyrik.ebnf`. Have a look at it. You'll find that is the same
grammar (plus a few additions) that has been mentioned just before.
You'll also find a little script `recompile_grammar.py` that is used to
compile an EBNF-Grammar into an executable Python-module that can be
used to parse any piece of text that this grammar is meant for; in this
case poetry.
case poetry.
Any DHParser-Project needs such a script. The content of the script is
pretty self-explanatory:
......@@ -259,7 +258,7 @@ pretty self-explanatory:
with open('Lyrik_ebnf_ERRORS.txt') as f:
print(f.read())
sys.exit(1)
The script simply (re-)compiles any EBNF grammar that it finds in the
current directory. "Recompiling" means that DHParser notices if a
grammar has already been compiled and overwrites only that part of the
......@@ -268,7 +267,7 @@ will come to that later what these are - can safely be edited by you.
Now just run `recompile_grammar.py` from the command line:
$ python3 recompile_grammar.py
You'll find that `recompile_grammar.py` has generated a new script with
the name `LyrikCompiler.py`. This script contains the Parser for the
`Lyrik.ebnf`-grammar and some skeleton-code for a DSL->XML-Compiler (or
......@@ -276,7 +275,7 @@ rather, a DSL-whatever compiler), which you can later fill in. Now let's
see how this script works:
$ python3 LyrikCompiler.py Lyrisches_Intermezzo_IV.txt >result.xml
The file `Lyrisches_Intermezzo_IV.txt` contains the fourth part of
Heinrich Heine's Lyrisches Intermezzo encoded in our own human-readable
poetry-DSL that has been shown above. Since we have redirected the
......@@ -317,7 +316,7 @@ recognizable!) first verse of the poem:
</vers>
...
How come it is so obfuscated, and where do all those pseudo-tags like
`<:RegExp>` and `<:Whitespace>` come from? Well, this is probably the
right time to explain a bit about parsing and compilation in general.
......@@ -354,7 +353,7 @@ as the grammar has to be specified for each application domain.
Before I'll explain how to specify an AST-transformation for DHParser,
you may want to know what difference it makes. There is a script
`LyrikCompiler_example.py` in the directory where the
AST-transformations are already included. Running the script
AST-transformations are already included. Running the script
$ python LyrikCompiler_example.py Lyrisches_Intermezzo_IV.txt
......@@ -362,7 +361,7 @@ yields the fairly clean Pseudo-XML-representation of the DSL-encoded
poem that we have seen above. Just as a teaser, you might want to look
up, how the AST-transformation is specified with DHParser. For this
purpose, you can have a look in file `LyrikCompiler_example.py`. If you
scrool down to the AST section, you'll see something like this:
scroll down to the AST section, you'll see something like this:
Lyrik_AST_transformation_table = {
# AST Transformations for the Lyrik-grammar
......@@ -389,8 +388,8 @@ As you can see, AST-transformations a specified declaratively (with the
option to add your own Python-programmed transformation rules). This
keeps the specification of the AST-transformation simple and concise. At
the same, we avoid adding hints for the AST-transformation in the
grammar specification, which would render the grammar less readable.
grammar specification, which would render the grammar less readable.
Next, I am going to explain step by step, how a domain specific language
for poems like Heine's Lyrisches Intermezzo can be designed, specified,
compiled and tested.
......
......@@ -7,7 +7,6 @@ specific languages (DSL) in Digital Humanities projects.
Author: Eckhart Arnold, Bavarian Academy of Sciences
Email: arnold@badw.de
License
-------
......@@ -32,18 +31,16 @@ Python 3.5 source code in order for DHParser to be backwards compatible
with Python 3.4. The module ``DHParser/foreign_typing.py`` is licensed under the
[Python Software Foundation License Version 2](https://docs.python.org/3.5/license.html)
Sources
-------
Find the sources on [gitlab.lrz.de/badw-it/DHParser](https://gitlab.lrz.de/badw-it/DHParser) .
Find the sources on [gitlab.lrz.de/badw-it/DHParser](https://gitlab.lrz.de/badw-it/DHParser) .
Get them with:
git clone https://gitlab.lrz.de/badw-it/DHParser
Please contact me, if you are intested in contributing to the
development or just using DHParser.
development or just using DHParser.
Disclaimer
----------
......@@ -55,14 +52,13 @@ function names changed in future versions. The API is NOT YET STABLE!
Use it for testing an evaluation, but not in an production environment
or contact me first, if you intend to do so.
Purpose
-------
DHParser leverages the power of Domain specific languages for the
DHParser leverages the power of Domain specific languages for the
Digital Humanities.
Domain specific languages are widespread in
Domain specific languages are widespread in
computer sciences, but seem to be underused in the Digital Humanities.
While DSLs are sometimes introduced to Digital-Humanities-projects as
[practical adhoc-solution][Müller_2016], these solutions are often
......@@ -76,17 +72,17 @@ parser generators, but employs the more modern form called
recursive descent parser.
Why another parser generator? There are plenty of good parser
generators out there, e.g. [Añez's grako parser generator][Añez_2017],
generators out there, e.g. [Añez's grako parser generator][Añez_2017],
[Eclipse XText][XText_Website]. However, DHParser is
intended as a tool that is specifically geared towards digital
humanities applications, while most existing parser generators come
from compiler construction toolkits for programming languages.
While I expect DSLs in computer science and DSLs in the Digital
from compiler construction toolkits for programming languages.
While I expect DSLs in computer science and DSLs in the Digital
Humanities to be quite similar as far as the technological realization
is concerned, the use cases, requirements and challenges are somewhat
different. For example, in the humanities annotating text is a central
use case, which is mostly absent in computer science treatments.
These differences might sooner or later require to develop the
is concerned, the use cases, requirements and challenges are somewhat
different. For example, in the humanities annotating text is a central
use case, which is mostly absent in computer science treatments.
These differences might sooner or later require to develop the
DSL-construction toolkits in a different direction. Also,
DHParser shall (in the future) serve as a teaching tool, which
influences some of its design decisions such as, for example, clearly
......@@ -113,7 +109,7 @@ Further (intended) use cases are:
Mark and Markdown also go beyond what is feasible with pure
EBNF-based-parsers.)
* EBNF itself. DHParser is already self-hosting ;-)
* Digital and cross-media editions
* Digital and cross-media editions
* Digital dictionaries
For a simple self-test run `dhparser.py` from the command line. This
......@@ -122,13 +118,11 @@ Python-based parser class representing that grammar. The concrete and
abstract syntax tree as well as a full and abbreviated log of the
parsing process will be stored in a sub-directory named "LOG".
Introduction
------------
see [Introduction.md](https://gitlab.lrz.de/badw-it/DHParser/blob/master/Introduction.md)
References
----------
......@@ -146,22 +140,19 @@ München 2016. Short-URL: [tiny.badw.de/2JVT][Arnold_2016]
[Arnold_2016]: https://f.hypotheses.org/wp-content/blogs.dir/1856/files/2016/12/EA_Pr%C3%A4sentation_Auszeichnungssprachen.pdf
Brian Ford: Parsing Expression Grammars: A Recognition-Based Syntactic
Foundation, Cambridge
Massachusetts, 2004. Short-URL:[http://t1p.de/jihs][Ford_2004]
[Ford_2004]: https://pdos.csail.mit.edu/~baford/packrat/popl04/peg-popl04.pdf
[Ford_20XX]: http://bford.info/packrat/
[Ford_20XX]: http://bford.info/packrat/
Richard A. Frost, Rahmatullah Hafiz and Paul Callaghan: Parser
Combinators for Ambiguous Left-Recursive Grammars, in: P. Hudak and
D.S. Warren (Eds.): PADL 2008, LNCS 4902, pp. 167–181, Springer-Verlag
Berlin Heidelberg 2008.
Dominikus Herzberg: Objekt-orientierte Parser-Kombinatoren in Python,
Blog-Post, September, 18th 2008 on denkspuren. gedanken, ideen,
anregungen und links rund um informatik-themen, short-URL:
......@@ -169,7 +160,6 @@ anregungen und links rund um informatik-themen, short-URL:
[Herzberg_2008a]: http://denkspuren.blogspot.de/2008/09/objekt-orientierte-parser-kombinatoren.html
Dominikus Herzberg: Eine einfache Grammatik für LaTeX, Blog-Post,
September, 18th 2008 on denkspuren. gedanken, ideen, anregungen und
links rund um informatik-themen, short-URL:
......@@ -177,17 +167,14 @@ links rund um informatik-themen, short-URL:
[Herzberg_2008b]: http://denkspuren.blogspot.de/2008/09/eine-einfache-grammatik-fr-latex.html
Dominikus Herzberg: Uniform Syntax, Blog-Post, February, 27th 2007 on
denkspuren. gedanken, ideen, anregungen und links rund um
informatik-themen, short-URL: [http://t1p.de/s0zk][Herzberg_2007]
[Herzberg_2007]: http://denkspuren.blogspot.de/2007/02/uniform-syntax.html
[ISO_IEC_14977]: http://www.cl.cam.ac.uk/~mgk25/iso-14977.pdf
John MacFarlane, David Greenspan, Vicent Marti, Neil Williams,
Benjamin Dumke-von der Ehe, Jeff Atwood: CommonMark. A strongly
defined, highly compatible specification of
......@@ -195,7 +182,6 @@ Markdown, 2017. [commonmark.org][MacFarlane_et_al_2017]
[MacFarlane_et_al_2017]: http://commonmark.org/
Stefan Müller: DSLs in den digitalen Geisteswissenschaften,
Präsentation auf dem
[dhmuc-Workshop: Digitale Editionen und Auszeichnungssprachen](https://dhmuc.hypotheses.org/workshop-digitale-editionen-und-auszeichnungssprachen),
......@@ -203,15 +189,15 @@ München 2016. Short-URL: [tiny.badw.de/2JVy][Müller_2016]
[Müller_2016]: https://f.hypotheses.org/wp-content/blogs.dir/1856/files/2016/12/Mueller_Anzeichnung_10_Vortrag_M%C3%BCnchen.pdf
Markus Voelter, Sbastian Benz, Christian Dietrich, Birgit Engelmann,
Mats Helander, Lennart Kats, Eelco Visser, Guido Wachsmuth:
Markus Voelter, Sbastian Benz, Christian Dietrich, Birgit Engelmann,
Mats Helander, Lennart Kats, Eelco Visser, Guido Wachsmuth:
DSL Engineering. Designing, Implementing and Using Domain-Specific Languages, 2013.
[http://dslbook.org/][Voelter_2013]
[http://dslbook.org/][Voelter_2013]
[voelter_2013]: http://dslbook.org/
[tex_stackexchange_no_bnf]: http://tex.stackexchange.com/questions/4201/is-there-a-bnf-grammar-of-the-tex-language
[tex_stackexchange_latex_parsers]: http://tex.stackexchange.com/questions/4223/what-parsers-for-latex-mathematics-exist-outside-of-the-tex-engines
[tex_stackexchange_latex_parsers]: http://tex.stackexchange.com/questions/4223/what-parsers-for-latex-mathematics-exist-outside-of-the-tex-engines
[XText_website]: https://www.eclipse.org/Xtext/
......@@ -36,9 +36,8 @@ EBNF_TEMPLATE = r"""-grammar
#
#######################################################################
@ testing = True # testing supresses error messages for unconnected symbols
@ whitespace = vertical # implicit whitespace, includes any number of line feeds
@ literalws = right # literals have implicit whitespace on the right hand side
@ literalws = right # literals have implicit whitespace on the right hand side
@ comment = /#.*/ # comments range from a '#'-character to the end of the line
@ ignorecase = False # literals and regular expressions are case-sensitive
......@@ -49,7 +48,7 @@ EBNF_TEMPLATE = r"""-grammar
#
#######################################################################
document = //~ { WORD } §EOF # root parser: optional whitespace followed by a sequence of words
document = //~ { WORD } §EOF # root parser: a sequence of words preceded by whitespace
# until the end of file
#######################################################################
......@@ -58,25 +57,25 @@ document = //~ { WORD } §EOF # root parser: optional whitespace followed by
#
#######################################################################
WORD = /\w+/~ # a sequence of letters, possibly followed by implicit whitespace
EOF = !/./ # no more characters ahead, end of file reached
WORD = /\w+/~ # a sequence of letters, optional trailing whitespace
EOF = !/./ # no more characters ahead, end of file reached
"""
TEST_WORD_TEMPLATE = r'''[match:WORD]
1 : word
2 : one_word_with_underscores
M1: word
M2: one_word_with_underscores
[fail:WORD]
1 : two words
F1: two words
'''
TEST_DOCUMENT_TEMPLATE = r'''[match:document]
1 : """This is a sequence of words
extending over several lines"""
M1: """This is a sequence of words
extending over several lines"""
[fail:document]
1 : """This test should fail, because neither
comma nor full have been defined anywhere."""
F1: """This test should fail, because neither
comma nor full have been defined anywhere."""
'''
README_TEMPLATE = """# {name}
......@@ -117,14 +116,16 @@ import DHParser.dsl
from DHParser import testing
from DHParser import toolkit
if not DHParser.dsl.recompile_grammar('{name}.ebnf', force=False): # recompiles Grammar only if it has changed
# recompiles Grammar only if it has changed
if not DHParser.dsl.recompile_grammar('{name}.ebnf', force=False):
print('\nErrors while recompiling "{name}.ebnf":\n--------------------------------------\n\n')
with open('{name}_ebnf_ERRORS.txt') as f:
print(f.read())
sys.exit(1)
sys.path.append('./')
# must be appended after module creation, because otherwise an ImportError is raised under Windows
# must be appended after module creation, because
# otherwise an ImportError is raised under Windows
from {name}Compiler import get_grammar, get_transformer
with toolkit.logging(True):
......@@ -135,7 +136,7 @@ if error_report:
print(error_report)
sys.exit(1)
else:
print('\nSUCCESS! All tests passed :-)')
print('ready.')
'''
......@@ -152,7 +153,7 @@ def create_project(path: str):
print('"%s" already exists! Not overwritten.' % name)
if os.path.exists(path) and not os.path.isdir(path):
print('Cannot create new project, because a file named "%s" alread exists!' % path)
print('Cannot create new project, because a file named "%s" already exists!' % path)
sys.exit(1)
name = os.path.basename(path)
print('Creating new DHParser-project "%s".' % name)
......@@ -172,6 +173,7 @@ def create_project(path: str):
create_file(name + '.ebnf', '# ' + name + EBNF_TEMPLATE)
create_file('README.md', README_TEMPLATE.format(name=name))
create_file('tst_%s_grammar.py' % name, GRAMMAR_TEST_TEMPLATE.format(name=name))
os.chmod('tst_%s_grammar.py' % name, 0o755)
os.chdir(curr_dir)
print('ready.')
......@@ -257,5 +259,6 @@ def main():
if not cpu_profile(selftest, 1):
sys.exit(1)
if __name__ == "__main__":
main()
......@@ -66,8 +66,9 @@ class EBNFGrammar(Grammar):
factor = [flowmarker] [retrieveop] symbol !"=" # negative lookahead to be sure it's not a definition
| [flowmarker] literal
| [flowmarker] regexp
| [flowmarker] group
| [flowmarker] oneormore
| [flowmarker] group
| [flowmarker] unordered
| repetition
| option
......@@ -76,6 +77,7 @@ class EBNFGrammar(Grammar):
retrieveop = "::" | ":" # '::' pop, ':' retrieve
group = "(" §expression ")"
unordered = "<" §expression ">" # elements of expression in arbitrary order
oneormore = "{" expression "}+"
repetition = "{" §expression "}"
option = "[" §expression "]"
......@@ -91,7 +93,7 @@ class EBNFGrammar(Grammar):
EOF = !/./
"""
expression = Forward()
source_hash__ = "3c472b3a5d1039680c751fd2dd3f3e24"
source_hash__ = "084a572ffab147ee44ac8f2268793f63"
parser_initialization__ = "upon instantiation"
COMMENT__ = r'#.*(?:\n|$)'
WHITESPACE__ = r'\s*'
......@@ -106,10 +108,11 @@ class EBNFGrammar(Grammar):
option = Series(Token("["), expression, Token("]"), mandatory=1)
repetition = Series(Token("{"), expression, Token("}"), mandatory=1)
oneormore = Series(Token("{"), expression, Token("}+"))
unordered = Series(Token("<"), expression, Token(">"), mandatory=1)
group = Series(Token("("), expression, Token(")"), mandatory=1)
retrieveop = Alternative(Token("::"), Token(":"))
flowmarker = Alternative(Token("!"), Token("&"), Token("-!"), Token("-&"))
factor = Alternative(Series(Option(flowmarker), Option(retrieveop), symbol, NegativeLookahead(Token("="))), Series(Option(flowmarker), literal), Series(Option(flowmarker), regexp), Series(Option(flowmarker), group), Series(Option(flowmarker), oneormore), repetition, option)
factor = Alternative(Series(Option(flowmarker), Option(retrieveop), symbol, NegativeLookahead(Token("="))), Series(Option(flowmarker), literal), Series(Option(flowmarker), regexp), Series(Option(flowmarker), oneormore), Series(Option(flowmarker), group), Series(Option(flowmarker), unordered), repetition, option)
term = OneOrMore(Series(Option(Token("§")), factor))
expression.set(Series(term, ZeroOrMore(Series(Token("|"), term))))
directive = Series(Token("@"), symbol, Token("="), Alternative(regexp, literal, list_), mandatory=1)
......
# EBNF-Grammar in EBNF
@ comment = /#.*(?:\n|$)/ # comments start with '#' and eat all chars up to and including '\n'
@ whitespace = /\s*/ # whitespace includes linefeed
@ literalws = right # trailing whitespace of literals will be ignored tacitly
syntax = [~//] { definition | directive } §EOF
definition = symbol §"=" §expression
directive = "@" §symbol §"=" §( regexp | literal | list_ )
expression = term { "|" term }
term = { factor }+
factor = [flowmarker] [retrieveop] symbol !"=" # negative lookahead to be sure it's not a definition
| [flowmarker] literal
| [flowmarker] regexp
| [flowmarker] group
| [flowmarker] oneormore
| repetition
| option
flowmarker = "!" | "&" | "§" # '!' negative lookahead, '&' positive lookahead, '§' required
| "-!" | "-&" # '-' negative lookbehind, '-&' positive lookbehind
retrieveop = "::" | ":" # '::' pop, ':' retrieve
group = "(" expression §")"
oneormore = "{" expression "}+"
repetition = "{" expression §"}"
option = "[" expression §"]"
symbol = /(?!\d)\w+/~ # e.g. expression, factor, parameter_list
literal = /"(?:[^"]|\\")*?"/~ # e.g. "(", '+', 'while'
| /'(?:[^']|\\')*?'/~ # whitespace following literals will be ignored tacitly.
regexp = /~?\/(?:\\\/|[^\/])*?\/~?/~ # e.g. /\w+/, ~/#.*(?:\n|$)/~
# '~' is a whitespace-marker, if present leading or trailing
# whitespace of a regular expression will be ignored tacitly.
list_ = /\w+/~ { "," /\w+/~ } # comma separated list of symbols, e.g. BEGIN_LIST, END_LIST,
# BEGIN_QUOTE, END_QUOTE ; see CommonMark/markdown.py for an exmaple
EOF = !/./
#!/usr/bin/python
#######################################################################
#
# SYMBOLS SECTION - Can be edited. Changes will be preserved.
#
#######################################################################
from functools import partial
import os
import sys
try:
import regex as re
except ImportError:
import re
from DHParser import logging, is_filename, load_if_file, \
Grammar, Compiler, nil_preprocessor, \
Lookbehind, Lookahead, Alternative, Pop, Required, Token, Synonym, \
Option, NegativeLookbehind, OneOrMore, RegExp, Retrieve, Series, RE, Capture, \
ZeroOrMore, Forward, NegativeLookahead, mixin_comment, compile_source, \
last_value, counterpart, accumulate, PreprocessorFunc, \
Node, TransformationFunc, TransformationDict, TRUE_CONDITION, \
traverse, remove_children_if, merge_children, is_anonymous, \