Skip to content
GitLab
Menu
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
The container registry cleanup task is now completed and the registry can be used normally.
Open sidebar
badw-it
DHParser
Commits
d929b790
Commit
d929b790
authored
Mar 04, 2021
by
di68kap
Browse files
ebnf.py: documentation extended
parent
3f3fa403
Changes
9
Hide whitespace changes
Inline
Side-by-side
DHParser/ebnf.py
View file @
d929b790
...
...
@@ -46,11 +46,11 @@ generated by compiling the grammar.
As an example, we will realize a json-parser (https://www.json.org/).
Let's start with creating some test-data::
>>> testobj = {'array': [1,
2,"
string"], '
int': 3
, 'bool': False}
>>> testobj = {'array': [1,
2.0, "a
string"], '
number': -1.3e+25
, 'bool': False}
>>> import json
>>> testdata = json.dumps(testobj)
>>> testdata
'{"array": [1, 2, "string"], "
int": 3
, "bool": false}'
'{"array": [1, 2
.0
, "
a
string"], "
number": -1.3e+25
, "bool": false}'
We define the json-Grammar (see https://www.json.org/) in
top-down manner in EBNF. We'll use a regular-expression look-alike
...
...
@@ -97,7 +97,7 @@ The structure of a JSON file can easily be described in EBNF::
'string = `"` §_CHARS `"` ~
\\
n'
\
' _CHARS = /[^"
\\\\
\]+/ | /
\\\\\\
\[\/bnrt
\\\\
\]/
\\
n'
\
'number = _INT _FRAC? _EXP? ~
\\
n'
\
' _INT = `-` /[1-9][0-9]+/ | /[0-9]/
\\
n'
\
' _INT = `-`
? (
/[1-9][0-9]+/ | /[0-9]/
)
\\
n'
\
' _FRAC = `.` /[0-9]+/
\\
n'
\
' _EXP = (`E`|`e`) [`+`|`-`] /[0-9]+/
\\
n'
\
'bool = "true" ~ | "false" ~
\\
n'
\
...
...
@@ -138,7 +138,7 @@ this grammar into executable Python-code, we use the high-level-function
>>> parser = create_parser(grammar, branding="JSON")
>>> syntax_tree = parser(testdata)
>>> syntax_tree.content
'{"array": [1, 2, "string"], "
int": 3
, "bool": false}'
'{"array": [1, 2
.0
, "
a
string"], "
number": -1.3e+25
, "bool": false}'
As expected serializing the content of the resulting syntax-tree yields exactly
the input-string of the parsing process. What we cannot see here, is that the
...
...
@@ -158,13 +158,16 @@ captures the first json-array within the syntax-tree::
(:Whitespace " ")
(_element
(number
(_INT "2")))
(_INT "2")
(_FRAC
(:Text ".")
(:RegExp "0"))))
(:Text ",")
(:Whitespace " ")
(_element
(string
(:Text '"')
(_CHARS "string")
(_CHARS "
a
string")
(:Text '"')))
(:Text "]"))
...
...
@@ -196,7 +199,8 @@ construct of the EBNF-grammar would leave a node in the syntax-tree::
(_element
(number
(_INT
(:RegExp "1"))))
(:Alternative
(:RegExp "1")))))
(:ZeroOrMore
(:Series
(:Text ",")
...
...
@@ -204,7 +208,12 @@ construct of the EBNF-grammar would leave a node in the syntax-tree::
(_element
(number
(_INT
(:RegExp "2")))))
(:Alternative
(:RegExp "2")))
(:Option
(_FRAC
(:Text ".")
(:RegExp "0"))))))
(:Series
(:Text ",")
(:Whitespace " ")
...
...
@@ -212,7 +221,7 @@ construct of the EBNF-grammar would leave a node in the syntax-tree::
(string
(:Text '"')
(_CHARS
(:RegExp "string"))
(:RegExp "
a
string"))
(:Text '"')))))))
(:Text "]"))
...
...
@@ -223,16 +232,70 @@ is advisable to streamline the syntax-tree as early on as possible,
because the processing time of all subsequent tree-processing stages
increases with the number of nodes in the tree.
Because of this DHParser offers further means of simplifying
Because of this
,
DHParser offers further means of simplifying
syntax-trees during the parsing stage, already. These are not turned
on by default, because they allow to drop content or to remove named
nodes from the tree; but they must be turned on by "directives" that
are listed at the top of an EBNF-grammar and that guide the
parser-generation process. DHParser-directives always start with an
`@`-sign. For example, the `@drop`-directive advises the parser to
drop certain nodes ::
drop certain nodes entirely, including their content. In the following
example, the parser is directed to drop all insignificant whitespace::
>>> drop_insignificant_wsp = '@drop = whitespace
\\
n'
Directives look similar to productions, only that on the right hand
side of the equal sign follows a list of parameters. In the case
of the drop-directive these can be either names of non-anomymous
nodes that shall be dropped or one of four particular classes of
anonymous nodes (`strings`, `backticked`, `regexp`, `whitespace`) that
will be dropped.
Another useful directive advises the parser to treat named nodes as
anynouse nodes and to eliminate them accordingly during parsing. This
is usefule, if we have introduced certain names in our grammar
only as placeholders to render the definition of the grammar a bit
more readable, not because we are intested in the text that is
captured by the production associated with them in their own right::
>>> anonymize_symbols = '@ anonymous = /_\w+/
\\
n'
Instead of passing a comma-separated list of symbols to the directive,
which would also have been possible, we have leveraged our convention
to prefix unimportant symbols with an underscore "_" by specifying the
symbols that shall by anonymized with a regular expression.
Now, let's examine the effect of these two directives::
>>> grammar = drop_insignificant_wsp + anonymize_symbols + grammar
>>> parser = create_parser(grammar, 'JSON')
>>> syntax_tree = parser(testdata)
>>> syntax_tree.content
'{"array":[1,2.0,"a string"],"number":-1.3e+25,"bool":false}'
You might have notived that all insigificant whitespaces adjacent to
the delimiters have been removed this time (but, of course not the
significant whitespace between "a" and "string" in "a string"). And
the difference, the use of these two directives makes, is even more
obvious, if we look at (a section of) the syntax-tree::
>>> print(syntax_tree.pick('array').as_sxpr(compact=True))
(array
(:Text "[")
(number "1")
(:Text ",")
(number
(:RegExp "2")
(:Text ".")
(:RegExp "0"))
(:Text ",")
(string
(:Text '"')
(:RegExp "a string")
(:Text '"'))
(:Text "]"))
>>>
"""
...
...
DHParser/parse.py
View file @
d929b790
...
...
@@ -381,7 +381,8 @@ class Parser:
tag_name: The tag_name for the nodes that are created by
the parser. If the parser is named, this is the same as
`pname`, otherwise it is the name of the parser's type.
`pname`, otherwise it is the name of the parser's type
prefixed with a colon ":".
visited: Mapping of places this parser has already been to
during the current parsing process onto the results the
...
...
@@ -1964,7 +1965,7 @@ class CombinedParser(Parser):
if
self
.
drop_content
:
return
EMPTY_NODE
return
node
if
node
.
tag_name
[
0
]
==
':'
:
# faster than node.is_
anonymous
()
if
node
.
anonymous
:
return
Node
(
self
.
tag_name
,
node
.
_result
)
return
Node
(
self
.
tag_name
,
node
)
elif
self
.
anonymous
:
...
...
@@ -1991,9 +1992,9 @@ class CombinedParser(Parser):
nr
=
[]
# type: List[Node]
# flatten parse tree
for
child
in
results
:
if
child
.
children
and
child
.
tag_name
[
0
]
==
':'
:
# faster than c.is_anonymous():
if
child
.
children
and
child
.
anonymous
:
# faster than c.is_anonymous():
nr
.
extend
(
child
.
children
)
elif
child
.
_result
or
child
.
tag_name
[
0
]
!=
':'
:
elif
child
.
_result
or
not
child
.
anonymous
:
nr
.
append
(
child
)
if
nr
or
not
self
.
anonymous
:
return
Node
(
self
.
tag_name
,
tuple
(
nr
))
...
...
@@ -3302,7 +3303,7 @@ class Synonym(UnaryParser):
if
not
self
.
anonymous
:
if
node
is
EMPTY_NODE
:
return
Node
(
self
.
tag_name
,
''
),
text
if
node
.
tag_name
[:
1
]
==
':'
:
if
node
.
anonymous
:
# eliminate anonymous child-node on the fly
node
.
tag_name
=
self
.
tag_name
else
:
...
...
DHParser/testing.py
View file @
d929b790
...
...
@@ -392,7 +392,7 @@ def grammar_unit(test_unit, parser_factory, transformer_factory, report='REPORT'
if
not
get_config_value
(
'test_parallelization'
):
print
(
' Testing parser: '
+
parser_name
)
track_history
=
False
track_history
=
get_config_value
(
'history_tracking'
)
try
:
if
has_lookahead
(
parser_name
):
set_tracer
(
all_descendants
(
parser
[
parser_name
]),
trace_history
)
...
...
examples/json/json.ebnf
View file @
d929b790
...
...
@@ -46,7 +46,7 @@ ESCAPE = /\\[\/bnrt\\]/ | UNICODE
UNICODE = "\u" HEX HEX
HEX = /[0-9a-fA-F][0-9a-fA-F]/
INT = [NEG] /[1-9][0-9]+/ | /[0-9]/
INT = [NEG]
(
/[1-9][0-9]+/ | /[0-9]/
)
NEG = `-`
FRAC = [ DOT /[0-9]+/ ]
DOT = `.`
...
...
examples/json/jsonParser.py
View file @
d929b790
...
...
@@ -79,7 +79,7 @@ class jsonGrammar(Grammar):
r
"""Parser for a json source file.
"""
_element
=
Forward
()
source_hash__
=
"
daa269448372c300359c9c6875a23031
"
source_hash__
=
"
bd32b246b5aa5fbdb1e18ac24d1da53b
"
anonymous__
=
re
.
compile
(
'_[A-Za-z]+|[A-Z]+'
)
static_analysis_pending__
=
[]
# type: List[bool]
parser_initialization__
=
[
"upon instantiation"
]
...
...
@@ -90,22 +90,22 @@ class jsonGrammar(Grammar):
wsp__
=
Whitespace
(
WSP_RE__
)
dwsp__
=
Drop
(
Whitespace
(
WSP_RE__
))
_EOF
=
NegativeLookahead
(
RegExp
(
'.'
))
EXP
=
Option
(
Series
(
Alternative
(
Drop
(
Text
(
"E"
)
)
,
Drop
(
Text
(
"e"
))
)
,
Option
(
Alternative
(
Drop
(
Text
(
"+"
)
)
,
Drop
(
Text
(
"-"
)))
)
,
RegExp
(
'[0-9]+'
)))
EXP
=
Option
(
Series
(
Alternative
(
Text
(
"E"
),
Text
(
"e"
)),
Option
(
Alternative
(
Text
(
"+"
),
Text
(
"-"
))),
RegExp
(
'[0-9]+'
)))
DOT
=
Text
(
"."
)
FRAC
=
Option
(
Series
(
DOT
,
RegExp
(
'[0-9]+'
)))
NEG
=
Text
(
"-"
)
INT
=
Alternative
(
Series
(
Option
(
NEG
),
RegExp
(
'[1-9][0-9]+'
)
)
,
RegExp
(
'[0-9]'
))
INT
=
Series
(
Option
(
NEG
),
Alternative
(
RegExp
(
'[1-9][0-9]+'
),
RegExp
(
'[0-9]'
))
)
HEX
=
RegExp
(
'[0-9a-fA-F][0-9a-fA-F]'
)
UNICODE
=
Series
(
Series
(
Drop
(
Text
(
"
\\
u"
)),
dwsp__
),
HEX
,
HEX
)
ESCAPE
=
Alternative
(
RegExp
(
'
\\\\
[/bnrt
\\\\
]'
),
UNICODE
)
PLAIN
=
RegExp
(
'[^"
\\\\
]+'
)
_CHARACTERS
=
ZeroOrMore
(
Alternative
(
PLAIN
,
ESCAPE
))
null
=
Series
(
Text
(
"null"
),
dwsp__
)
false
=
Series
(
Drop
(
Text
(
"false"
)
)
,
dwsp__
)
true
=
Series
(
Drop
(
Text
(
"true"
)
)
,
dwsp__
)
false
=
Series
(
Text
(
"false"
),
dwsp__
)
true
=
Series
(
Text
(
"true"
),
dwsp__
)
_bool
=
Alternative
(
true
,
false
)
number
=
Series
(
INT
,
FRAC
,
EXP
,
dwsp__
)
string
=
Series
(
Drop
(
Text
(
'"'
)
)
,
_CHARACTERS
,
Drop
(
Text
(
'"'
)
)
,
dwsp__
,
mandatory
=
1
)
string
=
Series
(
Text
(
'"'
),
_CHARACTERS
,
Text
(
'"'
),
dwsp__
,
mandatory
=
1
)
array
=
Series
(
Series
(
Drop
(
Text
(
"["
)),
dwsp__
),
Option
(
Series
(
_element
,
ZeroOrMore
(
Series
(
Series
(
Drop
(
Text
(
","
)),
dwsp__
),
_element
)))),
Series
(
Drop
(
Text
(
"]"
)),
dwsp__
),
mandatory
=
2
)
member
=
Series
(
string
,
Series
(
Drop
(
Text
(
":"
)),
dwsp__
),
_element
,
mandatory
=
1
)
object
=
Series
(
Series
(
Drop
(
Text
(
"{"
)),
dwsp__
),
member
,
ZeroOrMore
(
Series
(
Series
(
Drop
(
Text
(
","
)),
dwsp__
),
member
,
mandatory
=
1
)),
Series
(
Drop
(
Text
(
"}"
)),
dwsp__
),
mandatory
=
3
)
...
...
examples/json/json_fail_tolerant.ebnf
View file @
d929b790
...
...
@@ -66,7 +66,7 @@ ESCAPE = /\\[\/bnrt\\]/ | UNICODE
UNICODE = "\u" HEX HEX
HEX = /[0-9a-fA-F][0-9a-fA-F]/
INT = [ NEG ] /[0-9]/ | /[1-9][0-9]+/
INT = [ NEG ]
(
/[0-9]/ | /[1-9][0-9]+/
)
NEG = `-`
FRAC = [ DOT /[0-9]+/ ]
DOT = `.`
...
...
examples/json/test_grammar/02_test_JSON_elements.ini
View file @
d929b790
...
...
@@ -110,6 +110,7 @@ M2: 1.1
M3:
0
M4:
1.43E+22
M5:
20
M6:
-1.3e+25
[ast:number]
...
...
examples/json/test_grammar/03_test_EBNF-Directives.ini
0 → 100644
View file @
d929b790
examples/json/tst_json_grammar.py
View file @
d929b790
...
...
@@ -18,6 +18,7 @@ try:
from
DHParser
import
dsl
import
DHParser.log
from
DHParser
import
testing
from
DHParser.configuration
import
set_config_value
,
access_presets
,
set_preset_value
,
finalize_presets
except
ModuleNotFoundError
:
print
(
'Could not import DHParser. Please adjust sys.path in file '
'"%s" manually'
%
__file__
)
...
...
@@ -52,6 +53,10 @@ if __name__ == '__main__':
if
len
(
argv
)
>
1
and
sys
.
argv
[
1
]
==
"--debug"
:
LOGGING
=
'LOGS'
del
argv
[
1
]
access_presets
()
set_preset_value
(
'history_tracking'
,
True
)
finalize_presets
()
DHParser
.
log
.
start_logging
(
LOGGING
)
if
(
len
(
argv
)
>=
2
and
(
argv
[
1
].
endswith
(
'.ebnf'
)
or
os
.
path
.
splitext
(
argv
[
1
])[
1
].
lower
()
in
testing
.
TEST_READERS
.
keys
())):
# if called with a single filename that is either an EBNF file or a known
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment