Commit 225f8299 authored by di68kap's avatar di68kap
Browse files

Merge branch 'master' of https://gitlab.lrz.de/badw-it/DHParser

parents 23ff2da2 5618278f
...@@ -9,7 +9,7 @@ The best (and easiest) way to contribute at this stage is to try to implement ...@@ -9,7 +9,7 @@ The best (and easiest) way to contribute at this stage is to try to implement
a small DSL with DHParser and report bugs and problems and make suggestions a small DSL with DHParser and report bugs and problems and make suggestions
for further development. Have a look at the README.md-file to get started. for further development. Have a look at the README.md-file to get started.
Please the code from the git repository. Because code still changes quickly, Please, use the code from the git repository. Because code still changes quickly,
any prepackaged builds may be outdated. The repository is here: any prepackaged builds may be outdated. The repository is here:
https://gitlab.lrz.de/badw-it/DHParser https://gitlab.lrz.de/badw-it/DHParser
...@@ -25,8 +25,8 @@ bigger projects, below: ...@@ -25,8 +25,8 @@ bigger projects, below:
Ideas for further development Ideas for further development
============================= =============================
Better error reporting Better error reporting I
---------------------- ------------------------
A problem with error reporting consists in the fact that at best only the very A problem with error reporting consists in the fact that at best only the very
first parsing error is reported accurately and then triggers a number of pure first parsing error is reported accurately and then triggers a number of pure
...@@ -49,10 +49,44 @@ left recursion stack, etc. without making the parser guard (see ...@@ -49,10 +49,44 @@ left recursion stack, etc. without making the parser guard (see
Also, a good variety of test cases would be desirable. Also, a good variety of test cases would be desirable.
Optimizations Better error reporting II
------------- -------------------------
**Early discarding of nodes**: Yet another means to improve error reporting would be to supplement
the required operator "&" with an its forbidden operator, say "!&"
that would raise an error message, if some parser matches at a place
where it really shouldn't. [Add some examples here.]
Optimization and Enhancement: Two-way-Traversal for AST-Transformation
----------------------------------------------------------------------
AST-transformation are done via a depth-first tree-traversal, that is,
the traversal function first goes all the way up the tree to the leaf
nodes and calls the transformation routines successively on the way
down. The routines are picked from the transformation-table which is a
dictionary mapping Node's tag names to sequences of transformation functions.
The
rationale for depth-first is that it is easier to transform a node, if
all of its children have already been transformed, i.e. simplified.
However, there are quite a few cases where depth-last would be better.
For example if you know you are going to discard a whole branch starting
from a certain node, it is a waste to transform all the child nodes
first.
As the tree is traversed anyway, there no good reason why certain
transformation routines should not already be called on the way up.
Of course, as most routines
more or less assume depth first, we would need two transformation tables
one for the routines that are called on the way up. And one for the
routines that are called on the way down.
This should be fairly easy to implement.
Optimization: Early discarding of nodes
---------------------------------------
Reason: `traverse_recursive` and `Node.result-setter` are top time consumers! Reason: `traverse_recursive` and `Node.result-setter` are top time consumers!
Allow to specify parsers/nodes, the result of which Allow to specify parsers/nodes, the result of which
......
...@@ -87,11 +87,11 @@ try: ...@@ -87,11 +87,11 @@ try:
except ImportError: except ImportError:
import re import re
from DHParser import logging, is_filename, load_if_file, \\ from DHParser import logging, is_filename, load_if_file, \\
Grammar, Compiler, nil_preprocessor, PreprocessorToken, \\ Grammar, Compiler, nil_preprocessor, PreprocessorToken, Whitespace, \\
Lookbehind, Lookahead, Alternative, Pop, Token, Synonym, AllOf, SomeOf, Unordered, \\ Lookbehind, Lookahead, Alternative, Pop, Token, Synonym, AllOf, SomeOf, Unordered, \\
Option, NegativeLookbehind, OneOrMore, RegExp, Retrieve, Series, RE, Capture, \\ Option, NegativeLookbehind, OneOrMore, RegExp, Retrieve, Series, RE, Capture, \\
ZeroOrMore, Forward, NegativeLookahead, mixin_comment, compile_source, \\ ZeroOrMore, Forward, NegativeLookahead, mixin_comment, compile_source, \\
last_value, counterpart, accumulate, PreprocessorFunc, \\ grammar_changed, last_value, counterpart, accumulate, PreprocessorFunc, \\
Node, TransformationFunc, TransformationDict, \\ Node, TransformationFunc, TransformationDict, \\
traverse, remove_children_if, merge_children, is_anonymous, \\ traverse, remove_children_if, merge_children, is_anonymous, \\
reduce_single_child, replace_by_single_child, replace_or_reduce, remove_whitespace, \\ reduce_single_child, replace_by_single_child, replace_or_reduce, remove_whitespace, \\
...@@ -120,6 +120,14 @@ def compile_src(source, log_dir=''): ...@@ -120,6 +120,14 @@ def compile_src(source, log_dir=''):
if __name__ == "__main__": if __name__ == "__main__":
if len(sys.argv) > 1: if len(sys.argv) > 1:
try:
grammar_file_name = os.path.basename(__file__).replace('Compiler.py', '.ebnf')
if grammar_changed({NAME}Grammar, grammar_file_name):
print("Grammar has changed. Please recompile Grammar first.")
sys.exit(1)
except FileNotFoundError:
print('Could not check for changed grammar, because grammar file "%s" was not found!'
% grammar_file_name)
file_name, log_dir = sys.argv[1], '' file_name, log_dir = sys.argv[1], ''
if file_name in ['-d', '--debug'] and len(sys.argv) > 2: if file_name in ['-d', '--debug'] and len(sys.argv) > 2:
file_name, log_dir = sys.argv[2], 'LOGS' file_name, log_dir = sys.argv[2], 'LOGS'
......
...@@ -30,7 +30,7 @@ from functools import partial ...@@ -30,7 +30,7 @@ from functools import partial
from DHParser.compile import CompilerError, Compiler from DHParser.compile import CompilerError, Compiler
from DHParser.error import Error from DHParser.error import Error
from DHParser.parse import Grammar, mixin_comment, Forward, RegExp, RE, \ from DHParser.parse import Grammar, mixin_comment, Forward, RegExp, Whitespace, RE, \
NegativeLookahead, Alternative, Series, Option, OneOrMore, ZeroOrMore, Token NegativeLookahead, Alternative, Series, Option, OneOrMore, ZeroOrMore, Token
from DHParser.preprocess import nil_preprocessor, PreprocessorFunc from DHParser.preprocess import nil_preprocessor, PreprocessorFunc
from DHParser.syntaxtree import Node, WHITESPACE_PTYPE, TOKEN_PTYPE from DHParser.syntaxtree import Node, WHITESPACE_PTYPE, TOKEN_PTYPE
...@@ -77,56 +77,53 @@ def get_ebnf_preprocessor() -> PreprocessorFunc: ...@@ -77,56 +77,53 @@ def get_ebnf_preprocessor() -> PreprocessorFunc:
class EBNFGrammar(Grammar): class EBNFGrammar(Grammar):
r""" r"""Parser for an EBNF source file, with this grammar:
Parser for an EBNF source file, with this grammar::
# EBNF-Grammar in EBNF
# EBNF-Grammar in EBNF
@ comment = /#.*(?:\n|$)/ # comments start with '#' and eat all chars up to and including '\n'
@ comment = /#.*(?:\n|$)/ # comments start with '#' and @ whitespace = /\s*/ # whitespace includes linefeed
# eat all chars up to and including '\n' @ literalws = right # trailing whitespace of literals will be ignored tacitly
@ whitespace = /\s*/ # whitespace includes linefeed
@ literalws = right # trailing whitespace of literals will be syntax = [~//] { definition | directive } §EOF
# ignored tacitly definition = symbol §"=" expression
directive = "@" §symbol "=" ( regexp | literal | list_ )
syntax = [~//] { definition | directive } §EOF
definition = symbol §"=" expression expression = term { "|" term }
directive = "@" §symbol "=" ( regexp | literal | list_ ) term = { ["§"] factor }+ # "§" means all following factors mandatory
factor = [flowmarker] [retrieveop] symbol !"=" # negative lookahead to be sure it's not a definition
expression = term { "|" term } | [flowmarker] literal
term = { ["§"] factor }+ # "§" means all following factors mandatory | [flowmarker] plaintext
factor = [flowmarker] [retrieveop] symbol !"=" # negative lookahead to be sure | [flowmarker] regexp
# it's not a definition | [flowmarker] whitespace
| [flowmarker] literal | [flowmarker] oneormore
| [flowmarker] regexp | [flowmarker] group
| [flowmarker] oneormore | [flowmarker] unordered
| [flowmarker] group | repetition
| [flowmarker] unordered | option
| repetition
| option flowmarker = "!" | "&" # '!' negative lookahead, '&' positive lookahead
| "-!" | "-&" # '-' negative lookbehind, '-&' positive lookbehind
flowmarker = "!" | "&" # '!' negative lookahead, '&' positive lookahead retrieveop = "::" | ":" # '::' pop, ':' retrieve
| "-!" | "-&" # '-' negative lookbehind, '-&' positive lookbehind
retrieveop = "::" | ":" # '::' pop, ':' retrieve group = "(" §expression ")"
unordered = "<" §expression ">" # elements of expression in arbitrary order
group = "(" §expression ")" oneormore = "{" expression "}+"
unordered = "<" §expression ">" # elements of expression in arbitrary order repetition = "{" §expression "}"
oneormore = "{" expression "}+" option = "[" §expression "]"
repetition = "{" §expression "}"
option = "[" §expression "]" symbol = /(?!\d)\w+/~ # e.g. expression, factor, parameter_list
literal = /"(?:[^"]|\\")*?"/~ # e.g. "(", '+', 'while'
symbol = /(?!\d)\w+/~ # e.g. expression, factor, parameter_list | /'(?:[^']|\\')*?'/~ # whitespace following literals will be ignored tacitly.
literal = /"(?:[^"]|\\")*?"/~ # e.g. "(", '+', 'while' plaintext = /`(?:[^"]|\\")*?`/~ # like literal but does not eat whitespace
| /'(?:[^']|\\')*?'/~ # whitespace following literals will be ignored regexp = /~?\/(?:\\\/|[^\/])*?\/~?/~ # e.g. /\w+/, ~/#.*(?:\n|$)/~
regexp = /~?\/(?:\\\/|[^\/])*?\/~?/~ # e.g. /\w+/, ~/#.*(?:\n|$)/~ # '~' is a whitespace-marker, if present leading or trailing
# '~' is a whitespace-marker, if present leading # whitespace of a regular expression will be ignored tacitly.
# or trailing whitespace of a regular expression whitespace = /~/~ # implicit or default whitespace
# will be ignored tacitly. list_ = /\w+/~ { "," /\w+/~ } # comma separated list of symbols, e.g. BEGIN_LIST, END_LIST,
list_ = /\w+/~ { "," /\w+/~ } # comma separated list of symbols, # BEGIN_QUOTE, END_QUOTE ; see CommonMark/markdown.py for an exmaple
# e.g. BEGIN_LIST, END_LIST, EOF = !/./
# BEGIN_QUOTE, END_QUOTE """
# see CommonMark/markdown.py for an exmaple
EOF = !/./
"""
expression = Forward() expression = Forward()
source_hash__ = "3fc9f5a340f560e847d9af0b61a68743" source_hash__ = "3fc9f5a340f560e847d9af0b61a68743"
parser_initialization__ = "upon instantiation" parser_initialization__ = "upon instantiation"
...@@ -135,9 +132,12 @@ class EBNFGrammar(Grammar): ...@@ -135,9 +132,12 @@ class EBNFGrammar(Grammar):
WSP__ = mixin_comment(whitespace=WHITESPACE__, comment=COMMENT__) WSP__ = mixin_comment(whitespace=WHITESPACE__, comment=COMMENT__)
wspL__ = '' wspL__ = ''
wspR__ = WSP__ wspR__ = WSP__
whitespace__ = Whitespace(WSP__)
EOF = NegativeLookahead(RegExp('.')) EOF = NegativeLookahead(RegExp('.'))
list_ = Series(RE('\\w+'), ZeroOrMore(Series(Token(","), RE('\\w+')))) list_ = Series(RE('\\w+'), ZeroOrMore(Series(Token(","), RE('\\w+'))))
whitespace = RE('~')
regexp = RE('~?/(?:\\\\/|[^/])*?/~?') regexp = RE('~?/(?:\\\\/|[^/])*?/~?')
plaintext = RE('`(?:[^"]|\\\\")*?`')
literal = Alternative(RE('"(?:[^"]|\\\\")*?"'), RE("'(?:[^']|\\\\')*?'")) literal = Alternative(RE('"(?:[^"]|\\\\")*?"'), RE("'(?:[^']|\\\\')*?'"))
symbol = RE('(?!\\d)\\w+') symbol = RE('(?!\\d)\\w+')
option = Series(Token("["), expression, Token("]"), mandatory=1) option = Series(Token("["), expression, Token("]"), mandatory=1)
...@@ -147,18 +147,16 @@ class EBNFGrammar(Grammar): ...@@ -147,18 +147,16 @@ class EBNFGrammar(Grammar):
group = Series(Token("("), expression, Token(")"), mandatory=1) group = Series(Token("("), expression, Token(")"), mandatory=1)
retrieveop = Alternative(Token("::"), Token(":")) retrieveop = Alternative(Token("::"), Token(":"))
flowmarker = Alternative(Token("!"), Token("&"), Token("-!"), Token("-&")) flowmarker = Alternative(Token("!"), Token("&"), Token("-!"), Token("-&"))
factor = Alternative( factor = Alternative(Series(Option(flowmarker), Option(retrieveop), symbol, NegativeLookahead(Token("="))),
Series(Option(flowmarker), Option(retrieveop), symbol, NegativeLookahead(Token("="))), Series(Option(flowmarker), literal), Series(Option(flowmarker), plaintext),
Series(Option(flowmarker), literal), Series(Option(flowmarker), regexp), Series(Option(flowmarker), regexp), Series(Option(flowmarker), whitespace),
Series(Option(flowmarker), oneormore), Series(Option(flowmarker), group), Series(Option(flowmarker), oneormore), Series(Option(flowmarker), group),
Series(Option(flowmarker), unordered), repetition, option) Series(Option(flowmarker), unordered), repetition, option)
term = OneOrMore(Series(Option(Token("§")), factor)) term = OneOrMore(Series(Option(Token("§")), factor))
expression.set(Series(term, ZeroOrMore(Series(Token("|"), term)))) expression.set(Series(term, ZeroOrMore(Series(Token("|"), term))))
directive = Series(Token("@"), symbol, Token("="), Alternative(regexp, literal, list_), directive = Series(Token("@"), symbol, Token("="), Alternative(regexp, literal, list_), mandatory=1)
mandatory=1)
definition = Series(symbol, Token("="), expression, mandatory=1) definition = Series(symbol, Token("="), expression, mandatory=1)
syntax = Series(Option(RE('', wR='', wL=WSP__)), ZeroOrMore(Alternative(definition, directive)), syntax = Series(Option(RE('', wR='', wL=WSP__)), ZeroOrMore(Alternative(definition, directive)), EOF, mandatory=2)
EOF, mandatory=2)
root__ = syntax root__ = syntax
...@@ -385,6 +383,7 @@ class EBNFCompiler(Compiler): ...@@ -385,6 +383,7 @@ class EBNFCompiler(Compiler):
COMMENT_KEYWORD = "COMMENT__" COMMENT_KEYWORD = "COMMENT__"
WHITESPACE_KEYWORD = "WSP__" WHITESPACE_KEYWORD = "WSP__"
RAW_WS_KEYWORD = "WHITESPACE__" RAW_WS_KEYWORD = "WHITESPACE__"
WHITESPACE_PARSER_KEYWORD = "whitespace__"
RESERVED_SYMBOLS = {WHITESPACE_KEYWORD, RAW_WS_KEYWORD, COMMENT_KEYWORD} RESERVED_SYMBOLS = {WHITESPACE_KEYWORD, RAW_WS_KEYWORD, COMMENT_KEYWORD}
AST_ERROR = "Badly structured syntax tree. " \ AST_ERROR = "Badly structured syntax tree. " \
"Potentially due to erroneous AST transformation." "Potentially due to erroneous AST transformation."
...@@ -415,7 +414,7 @@ class EBNFCompiler(Compiler): ...@@ -415,7 +414,7 @@ class EBNFCompiler(Compiler):
self.definitions = {} # type: Dict[str, str] self.definitions = {} # type: Dict[str, str]
self.deferred_tasks = [] # type: List[Callable] self.deferred_tasks = [] # type: List[Callable]
self.root_symbol = "" # type: str self.root_symbol = "" # type: str
self.directives = {'whitespace': self.WHITESPACE['horizontal'], self.directives = {'whitespace': self.WHITESPACE['vertical'],
'comment': '', 'comment': '',
'literalws': {'right'}, 'literalws': {'right'},
'tokens': set(), # alt. 'preprocessor_tokens' 'tokens': set(), # alt. 'preprocessor_tokens'
...@@ -494,6 +493,12 @@ class EBNFCompiler(Compiler): ...@@ -494,6 +493,12 @@ class EBNFCompiler(Compiler):
return '\n'.join(compiler) return '\n'.join(compiler)
def verify_transformation_table(self, transtable): def verify_transformation_table(self, transtable):
"""
Checks for symbols that occur in the transformation-table but have
never been defined in the grammar. Usually, this kind of
inconsistency results from an error like a typo in the transformation
table.
"""
assert self._dirty_flag assert self._dirty_flag
table_entries = set(expand_table(transtable).keys()) - {'*', '+', '~'} table_entries = set(expand_table(transtable).keys()) - {'*', '+', '~'}
symbols = self.rules.keys() symbols = self.rules.keys()
...@@ -528,6 +533,8 @@ class EBNFCompiler(Compiler): ...@@ -528,6 +533,8 @@ class EBNFCompiler(Compiler):
# add special fields for Grammar class # add special fields for Grammar class
definitions.append((self.WHITESPACE_PARSER_KEYWORD,
'Whitespace(%s)' % self.WHITESPACE_KEYWORD))
definitions.append(('wspR__', self.WHITESPACE_KEYWORD definitions.append(('wspR__', self.WHITESPACE_KEYWORD
if 'right' in self.directives['literalws'] else "''")) if 'right' in self.directives['literalws'] else "''"))
definitions.append(('wspL__', self.WHITESPACE_KEYWORD definitions.append(('wspL__', self.WHITESPACE_KEYWORD
...@@ -906,9 +913,13 @@ class EBNFCompiler(Compiler): ...@@ -906,9 +913,13 @@ class EBNFCompiler(Compiler):
return symbol return symbol
def on_literal(self, node) -> str: def on_literal(self, node: Node) -> str:
return 'Token(' + node.content.replace('\\', r'\\') + ')' # return 'Token(' + ', return 'Token(' + node.content.replace('\\', r'\\') + ')'
# '.merge_children([node.result]) + ')' ?
def on_plaintext(self, node: Node) -> str:
return 'Token(' + node.content.replace('\\', r'\\').replace('`', '"') \
+ ", wL='', wR='')"
def on_regexp(self, node: Node) -> str: def on_regexp(self, node: Node) -> str:
...@@ -942,6 +953,10 @@ class EBNFCompiler(Compiler): ...@@ -942,6 +953,10 @@ class EBNFCompiler(Compiler):
return parser + ', '.join([arg] + name) + ')' return parser + ', '.join([arg] + name) + ')'
def on_whitespace(self, node: Node) -> str:
return 'whitespace__'
def on_list_(self, node) -> Set[str]: def on_list_(self, node) -> Set[str]:
assert node.children assert node.children
return set(item.result.strip() for item in node.children) return set(item.result.strip() for item in node.children)
......
...@@ -203,10 +203,10 @@ class HistoryRecord: ...@@ -203,10 +203,10 @@ class HistoryRecord:
FAIL = "FAIL" FAIL = "FAIL"
Snapshot = collections.namedtuple('Snapshot', ['line', 'column', 'stack', 'status', 'text']) Snapshot = collections.namedtuple('Snapshot', ['line', 'column', 'stack', 'status', 'text'])
COLGROUP = '<colgroup>\n<col style="width:2%"/><col style="width:2%"/><col style="width:75"/>' \ COLGROUP = '<colgroup>\n<col style="width:2%"/><col style="width:2%"/><col ' \
'<col style="width:6%"/><col style="width:15%"/>\n</colgroup>' 'style="width:75%"/><col style="width:6%"/><col style="width:15%"/>\n</colgroup>'
HEADINGS = ('<tr><th>L</th><th>C</th><th>parser calling sequence</th>' HEADINGS = ('<tr><th>L</th><th>C</th><th>parser call sequence</th>'
'<th>success</th><th>text to parse</th></tr>') '<th>success</th><th>text matched or failed</th></tr>')
HTML_LEAD_IN = ('<!DOCTYPE html>\n' HTML_LEAD_IN = ('<!DOCTYPE html>\n'
'<html>\n<head>\n<meta charset="utf-8"/>\n<style>\n' '<html>\n<head>\n<meta charset="utf-8"/>\n<style>\n'
'td,th {font-family:monospace; ' 'td,th {font-family:monospace; '
...@@ -289,7 +289,7 @@ class HistoryRecord: ...@@ -289,7 +289,7 @@ class HistoryRecord:
@property @property
def stack(self) -> str: def stack(self) -> str:
return "->".join((p.repr if p.ptype == ':RegExp' else p.name or p.ptype) return "->".join((p.repr if p.ptype in {':RegExp', ':PlainText'} else p.name or p.ptype)
for p in self.call_stack) for p in self.call_stack)
@property @property
......
...@@ -48,6 +48,7 @@ __all__ = ('Parser', ...@@ -48,6 +48,7 @@ __all__ = ('Parser',
'Grammar', 'Grammar',
'PreprocessorToken', 'PreprocessorToken',
'RegExp', 'RegExp',
'Whitespace',
'RE', 'RE',
'Token', 'Token',
'mixin_comment', 'mixin_comment',
...@@ -445,16 +446,18 @@ class Grammar: ...@@ -445,16 +446,18 @@ class Grammar:
history_tracking__: A flag indicating that the parsing history shall history_tracking__: A flag indicating that the parsing history shall
be tracked be tracked
wsp_left_parser__: A parser for the default left-adjacent-whitespace whitespace__: A parser for the implicit optional whitespace (or the
or the :class:zombie-parser if the :class:zombie-parser if the default is empty). The default
default is empty. The default whitespace will be used by parsers whitespace will be used by parsers :class:`Token` and, if no
:class:`Token` and, if no other parsers are passed to its constructor, other parsers are passed to its constructor, by parser
by parser :class:`RE`. :class:`RE`. It can also be place explicitly in the
EBNF-Grammar via the "~"-sign.
wsp_right_parser__: The same for the default right-adjacent-whitespace. wsp_left_parser__: The same as ``whitespace`` for
Both wsp_left_parser__ and wsp_right_parser__ merely serve the left-adjacent-whitespace.
purpose to avoid having to specify the default whitespace
explicitly every time an :class:`RE`-parser-object is created. wsp_right_parser__: The same as ``whitespace`` for
right-adjacent-whitespace.
_dirty_flag__: A flag indicating that the Grammar has been called at _dirty_flag__: A flag indicating that the Grammar has been called at
least once so that the parsing-variables need to be reset least once so that the parsing-variables need to be reset
...@@ -591,18 +594,22 @@ class Grammar: ...@@ -591,18 +594,22 @@ class Grammar:
# do so only arises during testing. # do so only arises during testing.
self.root__ = copy.deepcopy(root) if root else copy.deepcopy(self.__class__.root__) self.root__ = copy.deepcopy(root) if root else copy.deepcopy(self.__class__.root__)
if self.wspL__: if self.WSP__:
self.wsp_left_parser__ = Whitespace(self.wspL__) # type: ParserBase try:
self.wsp_left_parser__.grammar = self probe = self.whitespace__
self.all_parsers__.add(self.wsp_left_parser__) # don't you forget about me... assert self.whitespace__.regexp.pattern == self.WSP__
else: except AttributeError:
self.wsp_left_parser__ = ZOMBIE_PARSER self.whitespace__ = Whitespace(self.WSP__)
if self.wspR__: self.whitespace__.grammar = self
self.wsp_right_parser__ = Whitespace(self.wspR__) # type: ParserBase self.all_parsers__.add(self.whitespace__) # don't you forget about me...
self.wsp_right_parser__.grammar = self
self.all_parsers__.add(self.wsp_right_parser__) # don't you forget about me...
else: else:
self.wsp_right_parser__ = ZOMBIE_PARSER self.whitespace__ = ZOMBIE_PARSER
assert not self.wspL__ or self.wspL__ == self.WSP__
assert not self.wspR__ or self.wspR__ == self.WSP__
self.wsp_left_parser__ = self.whitespace__ if self.wspL__ else ZOMBIE_PARSER
self.wsp_right_parser__ = self.whitespace__ if self.wspR__ else ZOMBIE_PARSER
self.root__.apply(self._add_parser__) self.root__.apply(self._add_parser__)
...@@ -884,6 +891,9 @@ class PlainText(Parser): ...@@ -884,6 +891,9 @@ class PlainText(Parser):
return Node(self, self.text, True), text[self.len:] return Node(self, self.text, True), text[self.len:]
return None, text return None, text
def __repr__(self):
return ("'%s'" if self.text.find("'") <= 0 else '"%s"') % self.text
class RegExp(Parser): class RegExp(Parser):
r""" r"""
...@@ -941,6 +951,28 @@ class Whitespace(RegExp): ...@@ -941,6 +951,28 @@ class Whitespace(RegExp):
assert WHITESPACE_PTYPE == ":Whitespace" assert WHITESPACE_PTYPE == ":Whitespace"
#######################################################################
#######################################################################
#
# WARNING: The following code is hard to maintain, because it
# introduces a special case, i.e. a parser with child parsers that is
# not a descandent of the NaryOperator and, because it itneracts
# With the constructor of the Grammar class (see the instantiations of
# the Whitespace-class, there).
#
# That is all the more regrettable, as class RE basically just
# introduces syntactical sugar for
#
# Series(whitespace__, RegExp('something'), whitespace__)
#
# What to do? Throw the syntactical sugar out? :-( Or find a more
# robust solution for that kind of syntactical sugar? Or just leave
# it be?
#
######################################################################
######################################################################
class RE(Parser): class RE(Parser):
r""" r"""
Regular Expressions with optional leading or trailing whitespace. Regular Expressions with optional leading or trailing whitespace.
...@@ -982,9 +1014,8 @@ class RE(Parser): ...@@ -982,9 +1014,8 @@ class RE(Parser):
wL (str or regexp): Left whitespace regular expression, wL (str or regexp): Left whitespace regular expression,
i.e. either ``None``, the empty string or a regular i.e. either ``None``, the empty string or a regular
expression (e.g. "\s*") that defines whitespace. An expression (e.g. "\s*") that defines whitespace. An
empty string means no whitespace will be skipped, empty string means no whitespace will be skipped; ``None``
``None`` means that the default whitespace will be means that the default whitespace will be used.
used.
wR (str or regexp): Right whitespace regular expression. wR (str or regexp): Right whitespace regular expression.
See above. See above.
name: The optional name of the parser. name: The optional name of the parser.
......
...@@ -39,6 +39,7 @@ __all__ = ('ParserBase', ...@@ -39,6 +39,7 @@ __all__ = ('ParserBase',
'MockParser', 'MockParser',
'ZombieParser', 'ZombieParser',
'ZOMBIE_PARSER', 'ZOMBIE_PARSER',
'ZOMBIE_NODE',
'Node', 'Node',
'mock_syntax_tree', 'mock_syntax_tree',
'flatten_sxpr') 'flatten_sxpr')
...@@ -724,6 +725,9 @@ class Node(collections.abc.Sized): ...@@ -724,6 +725,9 @@ class Node(collections.abc.Sized):
return sum(child.tree_size() for child in self.children) + 1 return sum(child.tree_size() for child in self.children) + 1
ZOMBIE_NODE = Node(ZOMBIE_PARSER, '')
def mock_syntax_tree(sxpr): def mock_syntax_tree(sxpr):
""" """
Generates a tree of nodes from an S-expression. The main purpose of this is Generates a tree of nodes from an S-expression. The main purpose of this is
......
...@@ -30,7 +30,7 @@ for CST -> AST transformations. ...@@ -30,7 +30,7 @@ for CST -> AST transformations.
import inspect import inspect
from functools import partial, reduce, singledispatch from functools import partial, reduce, singledispatch
from DHParser.syntaxtree import Node, WHITESPACE_PTYPE, TOKEN_PTYPE, MockParser from DHParser.syntaxtree import Node, WHITESPACE_PTYPE, TOKEN_PTYPE, MockParser, ZOMBIE_NODE
from DHParser.toolkit import expand_table, smart_list, re, typing from DHParser.toolkit import expand_table, smart_list, re, typing
from typing import AbstractSet, Any, ByteString, Callable, cast, Container, Dict, \ from typing import AbstractSet, Any, ByteString, Callable, cast, Container, Dict, \
List, Sequence, Union, Text List, Sequence, Union, Text
...@@ -108,7 +108,7 @@ def transformation_factory(t1=None, t2=None, t3=None, t4=None, t5=None): ...@@ -108,7 +108,7 @@ def transformation_factory(t1=None, t2=None, t3=None, t4=None, t5=None):
dispatch on the first parameter after the context parameter. dispatch on the first parameter after the context parameter.
Decorating a transformation-function that has more than merely the Decorating a transformation-function that has more than merely the
``node``-parameter with ``transformation_factory`` creates a ``context``-parameter with ``transformation_factory`` creates a
function with the same name, which returns a partial-function that function with the same name, which returns a partial-function that
takes just the context-parameter. takes just the context-parameter.
...@@ -158,7 +158,7 @@ def transformation_factory(t1=None, t2=None, t3=None, t4=None, t5=None): ...@@ -158,7 +158,7 @@ def transformation_factory(t1=None, t2=None, t3=None, t4=None, t5=None):
f = singledispatch(f) f = singledispatch(f)
try: try:
if len(params) == 1 and issubclass(p1type, Container) \