2.12.2021, 9:00 - 11:00: Due to updates GitLab may be unavailable for some minutes between 09:00 and 11:00.

Commit c6f0a058 authored by eckhart's avatar eckhart
Browse files

- a few ameliorations

parent 3d085b24
Contributing
============
DHParser while already fairly mature in terms of implemented features is still
in an experimental state, where the API is changed and smaller features will
be added or dropped from version to version.
DHParser while already fairly mature in terms of implemented features is
still in an experimental state, where the API is changed and smaller
features will be added or dropped from version to version.
The best (and easiest) way to contribute at this stage is to try to implement
a small DSL with DHParser and report bugs and problems and make suggestions
for further development. Have a look at the README.md-file to get started.
The best (and easiest) way to contribute at this stage is to try to
implement a small DSL with DHParser and report bugs and problems and
make suggestions for further development. Have a look at the
README.md-file to get started.
Please, use the code from the git repository. Because code still changes quickly,
any prepackaged builds may be outdated. The repository is here:
Please, use the code from the git repository. Because code still changes
quickly, any prepackaged builds may be outdated. The repository is here:
https://gitlab.lrz.de/badw-it/DHParser
Also helpful, would be examples for Editor and IDE-Support for DSLs built
with DHParser. See https://gitlab.lrz.de/badw-it/MLW-DSL/tree/master/VSCode
for one such example.
Also helpful, would be examples for Editor and IDE-Support for DSLs
built with DHParser. See
https://gitlab.lrz.de/badw-it/MLW-DSL/tree/master/VSCode for one such
example.
In case you are interested in getting deeper into DHParser, there are some
bigger projects, below:
In case you are interested in getting deeper into DHParser, there are
some bigger projects, below:
Ideas for further development
......@@ -28,26 +30,27 @@ Ideas for further development
Testing for specific error messages
-----------------------------------
Allow testing of error reporting by extending testing.grammar_unit in such a way
that it is possible to test for specific error codes
Allow testing of error reporting by extending testing.grammar_unit in
such a way that it is possible to test for specific error codes
Better error reporting I
------------------------
A problem with error reporting consists in the fact that at best only the very
first parsing error is reported accurately and then triggers a number of pure
follow up errors. Stopping after the first error would mean that in order for
the user to detect all (true) errors in his or her file, the parser would have
to be run just as many times.
A problem with error reporting consists in the fact that at best only
the very first parsing error is reported accurately and then triggers a
number of pure follow up errors. Stopping after the first error would
mean that in order for the user to detect all (true) errors in his or
her file, the parser would have to be run just as many times.
A possible solution could be to define reentry points that can be caught by
a regular expression and where the parsing process restarts in a defined way.
A possible solution could be to define reentry points that can be caught
by a regular expression and where the parsing process restarts in a
defined way.
A reentry point could be defined as a pair (regular expression, parser) or
a triple (regular expression, parent parser, parser), where "parent parser"
would be the parser in the call stack to which the parsing process retreats,
before restarting.
A reentry point could be defined as a pair (regular expression, parser)
or a triple (regular expression, parent parser, parser), where "parent
parser" would be the parser in the call stack to which the parsing
process retreats, before restarting.
A challenge could be to manage a clean retreat (captured variables,
left recursion stack, etc. without making the parser guard (see
......@@ -56,15 +59,6 @@ left recursion stack, etc. without making the parser guard (see
Also, a good variety of test cases would be desirable.
Better error reporting II
-------------------------
Yet another means to improve error reporting would be to supplement
the required operator "&" with an its forbidden operator, say "!&"
that would raise an error message, if some parser matches at a place
where it really shouldn't. [Add some examples here.]
Optimization and Enhancement: Two-way-Traversal for AST-Transformation
----------------------------------------------------------------------
......@@ -72,113 +66,124 @@ AST-transformation are done via a depth-first tree-traversal, that is,
the traversal function first goes all the way up the tree to the leaf
nodes and calls the transformation routines successively on the way
down. The routines are picked from the transformation-table which is a
dictionary mapping Node's tag names to sequences of transformation functions.
dictionary mapping Node's tag names to sequences of transformation
functions.
The
rationale for depth-first is that it is easier to transform a node, if
all of its children have already been transformed, i.e. simplified.
The rationale for depth-first is that it is easier to transform a node,
if all of its children have already been transformed, i.e. simplified.
However, there are quite a few cases where depth-last would be better.
For example if you know you are going to discard a whole branch starting
from a certain node, it is a waste to transform all the child nodes
first.
As the tree is traversed anyway, there no good reason why certain
transformation routines should not already be called on the way up.
Of course, as most routines
more or less assume depth first, we would need two transformation tables
one for the routines that are called on the way up. And one for the
routines that are called on the way down.
transformation routines should not already be called on the way up. Of
course, as most routines more or less assume depth first, we would need
two transformation tables one for the routines that are called on the
way up. And one for the routines that are called on the way down.
This should be fairly easy to implement.
Optimization: Early discarding of nodes
---------------------------------------
Reason: `traverse_recursive` and `Node.result-setter` are top time consumers!
Allow to specify parsers/nodes, the result of which
will be dropped right away, so that the nodes they produce do not need to be
removed during the AST-Transformations. Typical candidates would be:
Reason: `traverse_recursive` and `Node.result-setter` are top time
consumers!
Allow to specify parsers/nodes, the result of which will be dropped
right away, so that the nodes they produce do not need to be removed
during the AST-Transformations. Typical candidates would be:
1. Tokens ":Token"
2. Whitespace ":Whitespace" (in some cases)
3. empty Nodes
and basically anything that would be removed globally ("+" entry in the
AST-Transformation dictionary) later anyway.
A directive ("@discarable = ...") could be introduced to specify the discardables
AST-Transformation dictionary) later anyway. A directive ("@discarable =
...") could be introduced to specify the discardables
Challenges:
1. Discardable Nodes should not even be created in the first place to avoid
costly object creation and assignment of result to the Node object on
creation.
2. ...but discarded or discardable nodes are not the same as a not matching parser.
Possible solution would be to introduce a dummy/zombie-Node that will be discarded
by the calling Parser, i.e. ZeroOrMore, Series etc.
1. Discardable Nodes should not even be created in the first place to
avoid costly object creation and assignment of result to the Node
object on creation.
2. ...but discarded or discardable nodes are not the same as a not
matching parser. Possible solution would be to introduce a
dummy/zombie-Node that will be discarded by the calling Parser, i.e.
ZeroOrMore, Series etc.
3. Two kinds of conditions for discarding...?
4. Capture/Retrieve/Pop - need the parsed data even if the node would otherwise
be discardable (Example: Variable Delimiters.) So, either:
a. temporarily suspend discarding by Grammar-object-flag set and cleared by
Capture/Retrieve/Pop. Means yet another flag has to be checked every time
the decision to discard or not needs to be taken...
4. Capture/Retrieve/Pop - need the parsed data even if the node would
otherwise be discardable (Example: Variable Delimiters.) So, either:
a. temporarily suspend discarding by Grammar-object-flag set and
cleared by Capture/Retrieve/Pop. Means yet another flag has to be
checked every time the decision to discard or not needs to be
taken...
b. statically check (i.e. check at compile time) that Capture/Retrieve/Pop
neither directly nor indirectly call a discardable parser. Downside:
Some parsers cannot profit from the optimization. For example variable
delimiters, otherwise as all delimiters a good candidate for discarding
cannot be discarded any more.
b. statically check (i.e. check at compile time) that
Capture/Retrieve/Pop neither directly nor indirectly call a
discardable parser. Downside: Some parsers cannot profit from the
optimization. For example variable delimiters, otherwise as all
delimiters a good candidate for discarding cannot be discarded any
more.
Debugging
---------
Supplement the History-Recording functionality of DHParser with a tracing
debugger, i.e. a debugger that allows to trace particular parsers:
Supplement the History-Recording functionality of DHParser with a
tracing debugger, i.e. a debugger that allows to trace particular
parsers:
- Add a tracing parser class that - like the Forward-Parser-class - "contains"
another parser without its calls being run through the parser guard, but
that records every call of the parser and its results, e.g. to trace the
`option`-parser from the ebnf-parser (see DHParser/ebnf.py) you'd write:
`option = Trace(Series(Token("["), expression, Token("]"), mandatory=1))`
- Add a tracing parser class that - like the Forward-Parser-class -
"contains" another parser without its calls being run through the
parser guard, but that records every call of the parser and its
results, e.g. to trace the `option`-parser from the ebnf-parser (see
DHParser/ebnf.py) you'd write: `option = Trace(Series(Token("["),
expression, Token("]"), mandatory=1))`
- For the ebnf-representation a tracing-prefix could be added, say `?`, e.g.
`option = ?("[" §expression "]")` or, alternatively,
`?option = "[" §expression "]"`
- Another Alternative would be to add an EBNF-compiler directive, say `@ trace`,
so one could write `@ trace = option` at the beginning of the EBNF-code.
* disadvantage: only parsers represented by symobols can be traced
(can always be circumvented by introducing further symbols.)
* advantages: less clutter in the EBNF-code and easier to switch between
debugging and production code by simply commenting out the
trace-statements at the beginning.
- For the ebnf-representation a tracing-prefix could be added, say `?`,
e.g. `option = ?("[" §expression "]")` or, alternatively, `?option =
"[" §expression "]"`
- Another Alternative would be to add an EBNF-compiler directive, say `@
trace`, so one could write `@ trace = option` at the beginning of the
EBNF-code.
* disadvantage: only parsers represented by symobols can be traced
(can always be circumvented by introducing further symbols.)
* advantages: less clutter in the EBNF-code and easier to switch
between debugging and production code by simply commenting out the
trace-statements at the beginning.
Semantic Actions
----------------
A alternative way (instead of using Capture-Pop/Retrieve with retrieve filters)
to implement semantic actions would be by using derived classes
in place of of the stock parser classes in the Grammar object.
The derived classes can easily implement semantic actions.
A alternative way (instead of using Capture-Pop/Retrieve with retrieve
filters) to implement semantic actions would be by using derived classes
in place of of the stock parser classes in the Grammar object. The
derived classes can easily implement semantic actions.
In order to integrate derived classes into the ebnf-based parser generation,
a directive could be implemented that either allows binding derived classes
to certain symbols or defining substitutes for stock parser classes or both.
In order to integrate derived classes into the ebnf-based parser
generation, a directive could be implemented that either allows binding
derived classes to certain symbols or defining substitutes for stock
parser classes or both.
The difference between the two cases (let's call them "binding" and
"substitution" to make the distinction) is that the former only
substitutes a particular parser (`term` in the example below) while the
latter substitutes all parsers of a kind (Alternative in the examples below).
In any case ebnf.EBNFCompiler should be extended to generate stubs for the
respective derived classes. The base class would either be the root parser
class for the symbol or the substituted class referred to in the directive.
Furthermore, ebnf.EBNFCompiler must, of course use the substituting parsers
in the generated Grammar class.
latter substitutes all parsers of a kind (Alternative in the examples
below).
In any case ebnf.EBNFCompiler should be extended to generate stubs for
the respective derived classes. The base class would either be the root
parser class for the symbol or the substituted class referred to in the
directive. Furthermore, ebnf.EBNFCompiler must, of course use the
substituting parsers in the generated Grammar class.
Syntax proposal and example (EBNF directive)
......
......@@ -112,20 +112,21 @@ class Compiler:
self.propagate_error_flags(node, lazy=True)
return result
def set_grammar_name(self, grammar_name="", grammar_source=""):
def set_grammar_name(self, grammar_name: str="", grammar_source: str=""):
"""
Changes the grammar's name and the grammar's source.
The grammar name and the source text of the grammar are
metadata about the grammar that do not affect the compilation
process. Classes inheriting from `Compiler` can use this
information to name and annotate its output.
information to name and annotate its output. Returns `self`.
"""
assert grammar_name == "" or re.match(r'\w+\Z', grammar_name)
if not grammar_name and re.fullmatch(r'[\w/:\\]+', grammar_source):
grammar_name = os.path.splitext(os.path.basename(grammar_source))[0]
self.grammar_name = grammar_name
self.grammar_source = load_if_file(grammar_source)
return self
@staticmethod
def propagate_error_flags(node: Node, lazy: bool = True) -> None:
......
......@@ -90,7 +90,7 @@ from DHParser import logging, is_filename, load_if_file, \\
Grammar, Compiler, nil_preprocessor, PreprocessorToken, Whitespace, \\
Lookbehind, Lookahead, Alternative, Pop, Token, Synonym, AllOf, SomeOf, Unordered, \\
Option, NegativeLookbehind, OneOrMore, RegExp, Retrieve, Series, RE, Capture, \\
ZeroOrMore, Forward, NegativeLookahead, mixin_comment, compile_source, \\
ZeroOrMore, Forward, NegativeLookahead, Required, mixin_comment, compile_source, \\
grammar_changed, last_value, counterpart, accumulate, PreprocessorFunc, \\
Node, TransformationFunc, TransformationDict, \\
traverse, remove_children_if, merge_children, is_anonymous, \\
......@@ -311,8 +311,10 @@ def grammar_provider(ebnf_src: str, branding="DSL") -> Grammar:
language defined by ``ebnf_src``.
"""
grammar_src = compileDSL(ebnf_src, nil_preprocessor, get_ebnf_grammar(),
get_ebnf_transformer(), get_ebnf_compiler(branding))
return compile_python_object(DHPARSER_IMPORTS + grammar_src, r'get_(?:\w+_)?grammar$')
get_ebnf_transformer(), get_ebnf_compiler(branding, ebnf_src))
grammar_obj = compile_python_object(DHPARSER_IMPORTS + grammar_src, r'get_(?:\w+_)?grammar$')
grammar_obj.python_src__ = grammar_src
return grammar_obj
def load_compiler_suite(compiler_suite: str) -> \
......@@ -347,7 +349,8 @@ def load_compiler_suite(compiler_suite: str) -> \
# Is there really any reasonable application case for this?
with logging(False):
compiler_py, messages, n = compile_source(source, None, get_ebnf_grammar(),
get_ebnf_transformer(), get_ebnf_compiler())
get_ebnf_transformer(),
get_ebnf_compiler(compiler_suite, source))
if has_errors(messages):
raise GrammarError(only_errors(messages), source)
preprocessor = get_ebnf_preprocessor
......
......@@ -789,23 +789,24 @@ class EBNFCompiler(Compiler):
# mandatory §-operator
mandatory_marker = []
filtered_children = []
i = 0
for nd in node.children:
if nd.parser.ptype == TOKEN_PTYPE and nd.content == "§":
mandatory_marker.append(i)
if i == 0:
nd.add_error('First item of a series should not be mandatory.',
Error.WARNING)
elif len(mandatory_marker) > 1:
mandatory_marker.append(len(filtered_children))
# if len(filtered_children) == 0:
# nd.add_error('First item of a series should not be mandatory.',
# Error.WARNING)
if len(mandatory_marker) > 1:
nd.add_error('One mandatory marker (§) sufficient to declare the '
'rest of the series as mandatory.', Error.WARNING)
else:
filtered_children.append(nd)
i += 1
saved_result = node.result
node.result = tuple(filtered_children)
custom_args = ['mandatory=%i' % mandatory_marker[0]] if mandatory_marker else []
compiled = self.non_terminal(node, 'Series', custom_args)
if len(filtered_children) == 1:
compiled = self.non_terminal(node, 'Required')
else:
custom_args = ['mandatory=%i' % mandatory_marker[0]] if mandatory_marker else []
compiled = self.non_terminal(node, 'Series', custom_args)
node.result = saved_result
return compiled
......
......@@ -63,6 +63,7 @@ __all__ = ('Parser',
'AllOf',
'SomeOf',
'Unordered',
'Required',
'Lookahead',
'NegativeLookahead',
'Lookbehind',
......@@ -440,6 +441,10 @@ class Grammar:
field contains a value other than "done". A value of "done" indicates
that the class has already been initialized.
python__src__: For the purpose of debugging and inspection, this field can
take the python src of the concrete grammar class
(see `dsl.grammar_provider`).
Attributes:
all_parsers__: A set of all parsers connected to this grammar object
......@@ -524,6 +529,7 @@ class Grammar:
If turned off, a recursion error will result in case of left
recursion.
"""
python_src__ = '' # type: str
root__ = ZOMBIE_PARSER # type: ParserBase
# root__ must be overwritten with the root-parser by grammar subclass
parser_initialization__ = "pending" # type: str
......@@ -566,8 +572,8 @@ class Grammar:
if isinstance(parser, Parser) and sane_parser_name(entry):
if not parser.name:
parser._name = entry
if isinstance(parser, Forward) and (not parser.parser._name):
parser.parser._name = entry
if isinstance(parser, Forward) and (not cast(Forward, parser).parser.name):
cast(Forward, parser).parser._name = entry
cls.parser_initialization__ = "done"
......@@ -1596,6 +1602,10 @@ class FlowOperator(UnaryOperator):
return bool_value
def Required(parser: Parser) -> Parser:
return Series(parser, mandatory=0)
# class Required(FlowOperator):
# """OBSOLETE. Use mandatory-parameter of Series-parser instead!
# """
......
......@@ -311,6 +311,17 @@ class TestSeries:
st = parser("DEAB_"); assert st.error_flag
assert st.collect_errors()[0].code == Error.MANDATORY_CONTINUATION
def test_boundary_cases(self):
lang = """
document = series | §!single | /.*/
series = "A" "B" §"C" "D"
single = "E"
"""
parser_class = grammar_provider(lang)
parser = parser_class()
print(parser.python_src__)
print(parser_class.python_src__)
class TestAllOfSomeOf:
def test_allOf_order(self):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment