Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
badw-it
DHParser
Commits
f30073b6
Commit
f30073b6
authored
Oct 10, 2018
by
Eckhart Arnold
Browse files
Merge branch 'development' of
https://gitlab.lrz.de/badw-it/DHParser
into development
parents
c7297d50
d047be52
Changes
3
Hide whitespace changes
Inline
Side-by-side
.gitignore
View file @
f30073b6
...
...
@@ -44,3 +44,4 @@ _static
_templates
.vs
OLDSTUFF
.pytest_cache
\ No newline at end of file
Introduction.md
View file @
f30073b6
...
...
@@ -386,7 +386,7 @@ scroll down to the AST section, you'll see something like this:
"ZEICHENFOLGE, NZ, JAHRESZAHL": content_from_sinlge_child,
"WORT, NAME, LEERZEILE, ENDE": [],
":Whitespace": replace_content(lambda node : " "),
":Token
, :RE
": content_from_sinlge_child,
":Token": content_from_sinlge_child,
"*": replace_by_single_child
}
...
...
@@ -399,4 +399,4 @@ grammar specification, which would render the grammar less readable.
Now that you have seen how DHParser basically works, it is time to go
through the process of desining and testing a domain specific notation
step by step from the very start. Head over to the documentation in
subdirectory and read the step by step guide.
\ No newline at end of file
subdirectory and read the step by step guide.
documentation/StepByStepGuide.rst
View file @
f30073b6
...
...
@@ -637,7 +637,7 @@ can easily write your own. How does this look like? ::
"part": [],
"WORD": [],
"EOF": [],
":
_
Token
, :_RE
": reduce_single_child,
":Token": reduce_single_child,
"*": replace_by_single_child
}
...
...
@@ -653,8 +653,8 @@ reached the transformations for its descendant nodes have already been applied.
As you can see, the transformation-table contains an entry for every known
parser, i.e. "document", "sentence", "part", "WORD", "EOF". (If any of these are
missing in the table of your ``poetryCompiler.py``, add them now!) In the
template you'll also find transformations for t
wo
anonymous parser
s, i.e.
":
_
Token"
and ":_RE"
as well as some curious entries such as "*" and "+". The
template you'll also find transformations for t
he
anonymous parser
":Token" as well as some curious entries such as "*" and "+". The
latter are considered to be "jokers". The transformations related to the
"+"-sign will be applied on any node, before any other transformation is
applied. In this case, all empty nodes will be removed first (transformation:
...
...
@@ -722,10 +722,10 @@ Running the "poetryCompiler.py"-script on "macbeth.dsl" again, yields::
<WORD>shadow</WORD>
</part>
<:Series>
<:
_
Token>
<:Token>
<:PlainText>,</:PlainText>
<:Whitespace> </:Whitespace>
</:
_
Token>
</:Token>
<part>
<WORD>a</WORD>
...
...
...
@@ -734,11 +734,10 @@ It starts to become more readable and concise, but there are sill some oddities.
Firstly, the Tokens that deliminate parts of sentences still contain whitespace.
Secondly, if several <part>-nodes follow each other in a <sentence>-node, the
<part>-nodes after the first one are enclosed by a <:Series>-node or even a
cascade of <:ZeroOrMore> and <:Series>-nodes. As for the <:
_
Token>-nodes, we
cascade of <:ZeroOrMore> and <:Series>-nodes. As for the <:Token>-nodes, we
can do the same trick as with the WORD-nodes::
":_Token": [remove_whitespace, reduce_single_child],
":_RE": reduce_single_child,
":Token": [remove_whitespace, reduce_single_child],
As to the nested structure of the <part>-nodes within the <sentence>-node, this
a rather typical case of syntactic artifacts that can be found in concrete
...
...
@@ -807,7 +806,7 @@ Now that everything is set, let's have a look at the result::
<WORD>walking</WORD>
<WORD>shadow</WORD>
</part>
<:
_
Token>,</:
_
Token>
<:Token>,</:Token>
<part>
<WORD>a</WORD>
<WORD>poor</WORD>
...
...
@@ -816,8 +815,8 @@ Now that everything is set, let's have a look at the result::
That is much better. There is but one slight blemish in the output: While all
nodes left a named nodes, i.e. nodes associated with a named parser, there are a
few anonymous <:
_
Token> nodes. Here is a little exercise: Do away with those
<:
_
Token>-nodes by replacing them by something semantically more meaningful.
few anonymous <:Token> nodes. Here is a little exercise: Do away with those
<:Token>-nodes by replacing them by something semantically more meaningful.
Hint: Add a new symbol "delimiter" in the grammar definition "poetry.ebnf". An
alternative strategy to extending the grammar would be to use the
``replace_parser`` operator. Which of the strategies is the better one? Explain
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment