Commit 072c5d19 authored by Eckhart Arnold's avatar Eckhart Arnold
Browse files

Merge remote-tracking branch 'origin/development' into development

parents c13f63ea 6b11e2aa
...@@ -1622,14 +1622,12 @@ class EBNFGrammar(Grammar): ...@@ -1622,14 +1622,12 @@ class EBNFGrammar(Grammar):
EBNF-definition of the Grammar:: EBNF-definition of the Grammar::
@ comment = /(?!#x[A-Fa-f0-9])#.*(?:\n|$)|\/\*(?:.|\n)*?\*\/|\(\*(?:.|\n)*?\*\)/ @ comment = /(?!#x[A-Fa-f0-9])#.*(?:\n|$)|\/\*(?:.|\n)*?\*\/|\(\*(?:.|\n)*?\*\)/
# comments can be either C-Style: /* ... */ # comments can be either C-Style: /* ... */
# or pascal/modula/oberon-style: (* ... *) # or pascal/modula/oberon-style: (* ... *)
# or python-style: # ... \n, # or python-style: # ... \n, excluding, however, character markers: #x20
# excluding, however, character markers: #x20 @ whitespace = /\s*/ # whitespace includes linefeed
@ whitespace = /\s*/ # whitespace includes linefeeds
@ literalws = right # trailing whitespace of literals will be ignored tacitly @ literalws = right # trailing whitespace of literals will be ignored tacitly
@ disposable = pure_elem, countable, FOLLOW_UP, SYM_REGEX, ANY_SUFFIX, EOF @ disposable = component, pure_elem, countable, FOLLOW_UP, SYM_REGEX, ANY_SUFFIX, EOF
@ drop = whitespace, EOF # do not include these even in the concrete syntax tree @ drop = whitespace, EOF # do not include these even in the concrete syntax tree
@ RNG_BRACE_filter = matching_bracket() # filter or transform content of RNG_BRACE on retrieve @ RNG_BRACE_filter = matching_bracket() # filter or transform content of RNG_BRACE on retrieve
...@@ -1638,6 +1636,7 @@ class EBNFGrammar(Grammar): ...@@ -1638,6 +1636,7 @@ class EBNFGrammar(Grammar):
@ definition_resume = /\n\s*(?=@|\w+\w*\s*=)/ @ definition_resume = /\n\s*(?=@|\w+\w*\s*=)/
@ directive_resume = /\n\s*(?=@|\w+\w*\s*=)/ @ directive_resume = /\n\s*(?=@|\w+\w*\s*=)/
# specialized error messages for certain cases # specialized error messages for certain cases
@ definition_error = /,/, 'Delimiter "," not expected in definition!\nEither this was meant to ' @ definition_error = /,/, 'Delimiter "," not expected in definition!\nEither this was meant to '
...@@ -1651,10 +1650,10 @@ class EBNFGrammar(Grammar): ...@@ -1651,10 +1650,10 @@ class EBNFGrammar(Grammar):
syntax = ~ { definition | directive } EOF syntax = ~ { definition | directive } EOF
definition = symbol §:DEF~ [ :OR~ ] expression :ENDL~ & FOLLOW_UP # [:OR~] to support v. Rossum's syntax definition = symbol §:DEF~ [ :OR~ ] expression :ENDL~ & FOLLOW_UP # [:OR~] to support v. Rossum's syntax
directive = "@" §symbol "=" (regexp | literals | procedure | symbol !DEF) directive = "@" §symbol "=" component { "," component } & FOLLOW_UP
{ "," (regexp | literals | procedure | symbol !DEF) } & FOLLOW_UP component = literals | procedure | expression
literals = { literal }+ # string chaining, only allowed in directives! literals = { literal }+ # string chaining, only allowed in directives!
procedure = SYM_REGEX "()" # procedure name, only allowed in directives! procedure = SYM_REGEX "()" # procedure name, only allowed in directives!
FOLLOW_UP = `@` | symbol | EOF FOLLOW_UP = `@` | symbol | EOF
...@@ -1663,7 +1662,7 @@ class EBNFGrammar(Grammar): ...@@ -1663,7 +1662,7 @@ class EBNFGrammar(Grammar):
expression = sequence { :OR~ sequence } expression = sequence { :OR~ sequence }
sequence = ["§"] ( interleave | lookaround ) # "§" means all following terms mandatory sequence = ["§"] ( interleave | lookaround ) # "§" means all following terms mandatory
{ :AND~ ["§"] ( interleave | lookaround ) } { !`@` !(symbol :DEF) :AND~ ["§"] ( interleave | lookaround ) }
interleave = difference { "°" ["§"] difference } interleave = difference { "°" ["§"] difference }
lookaround = flowmarker § (oneormore | pure_elem) lookaround = flowmarker § (oneormore | pure_elem)
difference = term ["-" § (oneormore | pure_elem)] difference = term ["-" § (oneormore | pure_elem)]
...@@ -1691,7 +1690,7 @@ class EBNFGrammar(Grammar): ...@@ -1691,7 +1690,7 @@ class EBNFGrammar(Grammar):
#: flow-operators #: flow-operators
flowmarker = "!" | "&" # '!' negative lookahead, '&' positive lookahead flowmarker = "!" | "&" # '!' negative lookahead, '&' positive lookahead
| "<-!" | "<-&" # '<-' negative lookbehind, '<-&' positive lookbehind | "<-!" | "<-&" # '<-!' negative lookbehind, '<-&' positive lookbehind
retrieveop = "::" | ":?" | ":" # '::' pop, ':?' optional pop, ':' retrieve retrieveop = "::" | ":?" | ":" # '::' pop, ':?' optional pop, ':' retrieve
...@@ -1729,7 +1728,7 @@ class EBNFGrammar(Grammar): ...@@ -1729,7 +1728,7 @@ class EBNFGrammar(Grammar):
EOF = !/./ [:?DEF] [:?OR] [:?AND] [:?ENDL] # [:?DEF], [:?OR], ... clear stack by eating stored value EOF = !/./ [:?DEF] [:?OR] [:?AND] [:?ENDL] # [:?DEF], [:?OR], ... clear stack by eating stored value
[:?RNG_DELIM] [:?BRACE_SIGN] [:?CH_LEADIN] [:?TIMES] [:?RE_LEADIN] [:?RE_LEADOUT] [:?RNG_DELIM] [:?BRACE_SIGN] [:?CH_LEADIN] [:?TIMES] [:?RE_LEADIN] [:?RE_LEADOUT]
DEF = `=` | `:=` | `::=` | `<-` | /:\n/ | `: ` # if `: `, retrieve marker mustn't be followed by blank! DEF = `=` | `:=` | `::=` | `<-` | /:\n/ | `: ` # with `: `, retrieve markers mustn't be followed by a blank!
OR = `|` | `/` !regex_heuristics OR = `|` | `/` !regex_heuristics
AND = `,` | `` AND = `,` | ``
ENDL = `;` | `` ENDL = `;` | ``
...@@ -1766,17 +1765,11 @@ class EBNFGrammar(Grammar): ...@@ -1766,17 +1765,11 @@ class EBNFGrammar(Grammar):
countable = Forward() countable = Forward()
element = Forward() element = Forward()
expression = Forward() expression = Forward()
source_hash__ = "3bda01686407a47a9fd0a709bda53ae3" source_hash__ = "c76fcc24e5077d4e150b771e6b60f0a1"
disposable__ = re.compile('component$|pure_elem$|countable$|FOLLOW_UP$|SYM_REGEX$|ANY_SUFFIX$|EOF$') disposable__ = re.compile('component$|pure_elem$|countable$|FOLLOW_UP$|SYM_REGEX$|ANY_SUFFIX$|EOF$')
static_analysis_pending__ = [] # type: List[bool] static_analysis_pending__ = [] # type: List[bool]
parser_initialization__ = ["upon instantiation"] parser_initialization__ = ["upon instantiation"]
error_messages__ = {'definition': [ error_messages__ = {'definition': [(re.compile(r','), 'Delimiter "," not expected in definition!\\nEither this was meant to be a directive and the directive symbol @ is missing\\nor the error is due to inconsistent use of the comma as a delimiter\\nfor the elements of a sequence.')]}
(re.compile(r','),
'Delimiter "," not expected in definition!\\nEither this was meant to be a directive '
'and the directive symbol @ is missing\\nor the error is due to inconsistent use of the '
'comma as a delimiter\\nfor the elements of a sequence.')]}
resume_rules__ = {'definition': [re.compile(r'\n\s*(?=@|\w+\w*\s*=)')],
'directive': [re.compile(r'\n\s*(?=@|\w+\w*\s*=)')]}
COMMENT__ = r'(?!#x[A-Fa-f0-9])#.*(?:\n|$)|\/\*(?:.|\n)*?\*\/|\(\*(?:.|\n)*?\*\)' COMMENT__ = r'(?!#x[A-Fa-f0-9])#.*(?:\n|$)|\/\*(?:.|\n)*?\*\/|\(\*(?:.|\n)*?\*\)'
comment_rx__ = re.compile(COMMENT__) comment_rx__ = re.compile(COMMENT__)
WHITESPACE__ = r'\s*' WHITESPACE__ = r'\s*'
...@@ -1787,15 +1780,8 @@ class EBNFGrammar(Grammar): ...@@ -1787,15 +1780,8 @@ class EBNFGrammar(Grammar):
SYM_REGEX = RegExp('(?!\\d)\\w+') SYM_REGEX = RegExp('(?!\\d)\\w+')
RE_CORE = RegExp('(?:(?<!\\\\)\\\\(?:/)|[^/])*') RE_CORE = RegExp('(?:(?<!\\\\)\\\\(?:/)|[^/])*')
regex_heuristics = Alternative(RegExp('[^ ]'), RegExp('[^/\\n*?+\\\\]*[*?+\\\\][^/\\n]/')) regex_heuristics = Alternative(RegExp('[^ ]'), RegExp('[^/\\n*?+\\\\]*[*?+\\\\][^/\\n]/'))
literal_heuristics = Alternative(RegExp('~?\\s*"(?:[\\\\]\\]|[^\\]]|[^\\\\]\\[[^"]*)*"'), literal_heuristics = Alternative(RegExp('~?\\s*"(?:[\\\\]\\]|[^\\]]|[^\\\\]\\[[^"]*)*"'), RegExp("~?\\s*'(?:[\\\\]\\]|[^\\]]|[^\\\\]\\[[^']*)*'"), RegExp('~?\\s*`(?:[\\\\]\\]|[^\\]]|[^\\\\]\\[[^`]*)*`'), RegExp('~?\\s*´(?:[\\\\]\\]|[^\\]]|[^\\\\]\\[[^´]*)*´'), RegExp('~?\\s*/(?:[\\\\]\\]|[^\\]]|[^\\\\]\\[[^/]*)*/'))
RegExp("~?\\s*'(?:[\\\\]\\]|[^\\]]|[^\\\\]\\[[^']*)*'"), char_range_heuristics = NegativeLookahead(Alternative(RegExp('[\\n\\t ]'), Series(dwsp__, literal_heuristics), Series(Option(Alternative(Text("::"), Text(":?"), Text(":"))), SYM_REGEX, RegExp('\\s*\\]'))))
RegExp('~?\\s*`(?:[\\\\]\\]|[^\\]]|[^\\\\]\\[[^`]*)*`'),
RegExp('~?\\s*´(?:[\\\\]\\]|[^\\]]|[^\\\\]\\[[^´]*)*´'),
RegExp('~?\\s*/(?:[\\\\]\\]|[^\\]]|[^\\\\]\\[[^/]*)*/'))
char_range_heuristics = NegativeLookahead(Alternative(
RegExp('[\\n\\t ]'), Series(dwsp__, literal_heuristics),
Series(Option(Alternative(Text("::"), Text(":?"), Text(":"))),
SYM_REGEX, RegExp('\\s*\\]'))))
CH_LEADIN = Capture(Alternative(Text("0x"), Text("#x"))) CH_LEADIN = Capture(Alternative(Text("0x"), Text("#x")))
RE_LEADOUT = Capture(Text("/")) RE_LEADOUT = Capture(Text("/"))
RE_LEADIN = Capture(Alternative(Series(Text("/"), Lookahead(regex_heuristics)), Text("^/"))) RE_LEADIN = Capture(Alternative(Series(Text("/"), Lookahead(regex_heuristics)), Text("^/")))
...@@ -1806,87 +1792,46 @@ class EBNFGrammar(Grammar): ...@@ -1806,87 +1792,46 @@ class EBNFGrammar(Grammar):
ENDL = Capture(Alternative(Text(";"), Text(""))) ENDL = Capture(Alternative(Text(";"), Text("")))
AND = Capture(Alternative(Text(","), Text(""))) AND = Capture(Alternative(Text(","), Text("")))
OR = Capture(Alternative(Text("|"), Series(Text("/"), NegativeLookahead(regex_heuristics)))) OR = Capture(Alternative(Text("|"), Series(Text("/"), NegativeLookahead(regex_heuristics))))
DEF = Capture(Alternative(Text("="), Text(":="), Text("::="), DEF = Capture(Alternative(Text("="), Text(":="), Text("::="), Text("<-"), RegExp(':\\n'), Text(": ")))
Text("<-"), RegExp(':\\n'), Text(": "))) EOF = Drop(Series(Drop(NegativeLookahead(RegExp('.'))), Drop(Option(Drop(Pop(DEF, match_func=optional_last_value)))), Drop(Option(Drop(Pop(OR, match_func=optional_last_value)))), Drop(Option(Drop(Pop(AND, match_func=optional_last_value)))), Drop(Option(Drop(Pop(ENDL, match_func=optional_last_value)))), Drop(Option(Drop(Pop(RNG_DELIM, match_func=optional_last_value)))), Drop(Option(Drop(Pop(BRACE_SIGN, match_func=optional_last_value)))), Drop(Option(Drop(Pop(CH_LEADIN, match_func=optional_last_value)))), Drop(Option(Drop(Pop(TIMES, match_func=optional_last_value)))), Drop(Option(Drop(Pop(RE_LEADIN, match_func=optional_last_value)))), Drop(Option(Drop(Pop(RE_LEADOUT, match_func=optional_last_value))))))
EOF = Drop(Drop(Series(Drop(NegativeLookahead(RegExp('.'))),
Drop(Option(Drop(Pop(DEF, match_func=optional_last_value)))),
Drop(Option(Drop(Pop(OR, match_func=optional_last_value)))),
Drop(Option(Drop(Pop(AND, match_func=optional_last_value)))),
Drop(Option(Drop(Pop(ENDL, match_func=optional_last_value)))),
Drop(Option(Drop(Pop(RNG_DELIM, match_func=optional_last_value)))),
Drop(Option(Drop(Pop(BRACE_SIGN, match_func=optional_last_value)))),
Drop(Option(Drop(Pop(CH_LEADIN, match_func=optional_last_value)))),
Drop(Option(Drop(Pop(TIMES, match_func=optional_last_value)))),
Drop(Option(Drop(Pop(RE_LEADIN, match_func=optional_last_value)))),
Drop(Option(Drop(Pop(RE_LEADOUT, match_func=optional_last_value)))))))
whitespace = Series(RegExp('~'), dwsp__) whitespace = Series(RegExp('~'), dwsp__)
any_char = Series(Text("."), dwsp__) any_char = Series(Text("."), dwsp__)
free_char = Alternative(RegExp('[^\\n\\[\\]\\\\]'), RegExp('\\\\[nrt`´\'"(){}\\[\\]/\\\\]')) free_char = Alternative(RegExp('[^\\n\\[\\]\\\\]'), RegExp('\\\\[nrt`´\'"(){}\\[\\]/\\\\]'))
character = Series(Retrieve(CH_LEADIN), HEXCODE) character = Series(Retrieve(CH_LEADIN), HEXCODE)
char_range = Series(Text("["), Lookahead(char_range_heuristics), Option(Text("^")), char_range = Series(Text("["), Lookahead(char_range_heuristics), Option(Text("^")), Alternative(character, free_char), ZeroOrMore(Alternative(Series(Option(Text("-")), character), free_char)), Series(Text("]"), dwsp__))
Alternative(character, free_char),
ZeroOrMore(Alternative(Series(Option(Text("-")), character), free_char)),
Series(Text("]"), dwsp__))
regexp = Series(Retrieve(RE_LEADIN), RE_CORE, Retrieve(RE_LEADOUT), dwsp__) regexp = Series(Retrieve(RE_LEADIN), RE_CORE, Retrieve(RE_LEADOUT), dwsp__)
plaintext = Alternative(Series(RegExp('`(?:(?<!\\\\)\\\\`|[^`])*?`'), dwsp__), plaintext = Alternative(Series(RegExp('`(?:(?<!\\\\)\\\\`|[^`])*?`'), dwsp__), Series(RegExp('´(?:(?<!\\\\)\\\\´|[^´])*?´'), dwsp__))
Series(RegExp('´(?:(?<!\\\\)\\\\´|[^´])*?´'), dwsp__)) literal = Alternative(Series(RegExp('"(?:(?<!\\\\)\\\\"|[^"])*?"'), dwsp__), Series(RegExp("'(?:(?<!\\\\)\\\\'|[^'])*?'"), dwsp__))
literal = Alternative(Series(RegExp('"(?:(?<!\\\\)\\\\"|[^"])*?"'), dwsp__),
Series(RegExp("'(?:(?<!\\\\)\\\\'|[^'])*?'"), dwsp__))
symbol = Series(SYM_REGEX, dwsp__) symbol = Series(SYM_REGEX, dwsp__)
multiplier = Series(RegExp('[1-9]\\d*'), dwsp__) multiplier = Series(RegExp('[1-9]\\d*'), dwsp__)
no_range = Alternative(NegativeLookahead(multiplier), no_range = Alternative(NegativeLookahead(multiplier), Series(Lookahead(multiplier), Retrieve(TIMES)))
Series(Lookahead(multiplier), Retrieve(TIMES))) range = Series(RNG_BRACE, dwsp__, multiplier, Option(Series(Retrieve(RNG_DELIM), dwsp__, multiplier)), Pop(RNG_BRACE, match_func=matching_bracket), dwsp__)
range = Series(RNG_BRACE, dwsp__, multiplier, counted = Alternative(Series(countable, range), Series(countable, Retrieve(TIMES), dwsp__, multiplier), Series(multiplier, Retrieve(TIMES), dwsp__, countable, mandatory=3))
Option(Series(Retrieve(RNG_DELIM), dwsp__, multiplier)), option = Alternative(Series(NegativeLookahead(char_range), Series(Text("["), dwsp__), expression, Series(Text("]"), dwsp__), mandatory=2), Series(element, Series(Text("?"), dwsp__)))
Pop(RNG_BRACE, match_func=matching_bracket), dwsp__) repetition = Alternative(Series(Series(Text("{"), dwsp__), no_range, expression, Series(Text("}"), dwsp__), mandatory=2), Series(element, Series(Text("*"), dwsp__), no_range))
counted = Alternative(Series(countable, range), oneormore = Alternative(Series(Series(Text("{"), dwsp__), no_range, expression, Series(Text("}+"), dwsp__)), Series(element, Series(Text("+"), dwsp__)))
Series(countable, Retrieve(TIMES), dwsp__, multiplier), group = Series(Series(Text("("), dwsp__), no_range, expression, Series(Text(")"), dwsp__), mandatory=2)
Series(multiplier, Retrieve(TIMES), dwsp__, countable, mandatory=3)) retrieveop = Alternative(Series(Text("::"), dwsp__), Series(Text(":?"), dwsp__), Series(Text(":"), dwsp__))
option = Alternative(Series(NegativeLookahead(char_range), Series(Text("["), dwsp__), flowmarker = Alternative(Series(Text("!"), dwsp__), Series(Text("&"), dwsp__), Series(Text("<-!"), dwsp__), Series(Text("<-&"), dwsp__))
expression, Series(Text("]"), dwsp__), mandatory=2),
Series(element, Series(Text("?"), dwsp__)))
repetition = Alternative(Series(Series(Text("{"), dwsp__), no_range,
expression, Series(Text("}"), dwsp__), mandatory=2),
Series(element, Series(Text("*"), dwsp__), no_range))
oneormore = Alternative(Series(Series(Text("{"), dwsp__), no_range, expression,
Series(Text("}+"), dwsp__)),
Series(element, Series(Text("+"), dwsp__)))
group = Series(Series(Text("("), dwsp__), no_range,
expression, Series(Text(")"), dwsp__), mandatory=2)
retrieveop = Alternative(Series(Text("::"), dwsp__),
Series(Text(":?"), dwsp__),
Series(Text(":"), dwsp__))
flowmarker = Alternative(Series(Text("!"), dwsp__), Series(Text("&"), dwsp__),
Series(Text("<-!"), dwsp__), Series(Text("<-&"), dwsp__))
ANY_SUFFIX = RegExp('[?*+]') ANY_SUFFIX = RegExp('[?*+]')
element.set(Alternative(Series(Option(retrieveop), symbol, NegativeLookahead(Retrieve(DEF))), literals = OneOrMore(literal)
literal, plaintext, regexp, char_range, Series(character, dwsp__),
any_char, whitespace, group))
pure_elem = Series(element, NegativeLookahead(ANY_SUFFIX), mandatory=1) pure_elem = Series(element, NegativeLookahead(ANY_SUFFIX), mandatory=1)
countable.set(Alternative(option, oneormore, element)) procedure = Series(SYM_REGEX, Series(Text("()"), dwsp__))
term = Alternative(oneormore, counted, repetition, option, pure_elem) term = Alternative(oneormore, counted, repetition, option, pure_elem)
difference = Series(term, Option(Series(Series(Text("-"), dwsp__), difference = Series(term, Option(Series(Series(Text("-"), dwsp__), Alternative(oneormore, pure_elem), mandatory=1)))
Alternative(oneormore, pure_elem), mandatory=1)))
lookaround = Series(flowmarker, Alternative(oneormore, pure_elem), mandatory=1) lookaround = Series(flowmarker, Alternative(oneormore, pure_elem), mandatory=1)
interleave = Series(difference, ZeroOrMore(Series(Series(Text("°"), dwsp__), interleave = Series(difference, ZeroOrMore(Series(Series(Text("°"), dwsp__), Option(Series(Text("§"), dwsp__)), difference)))
Option(Series(Text("§"), dwsp__)), sequence = Series(Option(Series(Text("§"), dwsp__)), Alternative(interleave, lookaround), ZeroOrMore(Series(NegativeLookahead(Text("@")), NegativeLookahead(Series(symbol, Retrieve(DEF))), Retrieve(AND), dwsp__, Option(Series(Text("§"), dwsp__)), Alternative(interleave, lookaround))))
difference)))
sequence = Series(Option(Series(Text("§"), dwsp__)), Alternative(interleave, lookaround),
ZeroOrMore(Series(Retrieve(AND), dwsp__, Option(Series(Text("§"), dwsp__)),
Alternative(interleave, lookaround))))
expression.set(Series(sequence, ZeroOrMore(Series(Retrieve(OR), dwsp__, sequence))))
FOLLOW_UP = Alternative(Text("@"), symbol, EOF) FOLLOW_UP = Alternative(Text("@"), symbol, EOF)
procedure = Series(SYM_REGEX, Series(Text("()"), dwsp__)) definition = Series(symbol, Retrieve(DEF), dwsp__, Option(Series(Retrieve(OR), dwsp__)), expression, Retrieve(ENDL), dwsp__, Lookahead(FOLLOW_UP), mandatory=1)
literals = OneOrMore(literal) component = Alternative(literals, procedure, expression)
component = Alternative(regexp, literals, procedure, Series(symbol, NegativeLookahead(DEF))) directive = Series(Series(Text("@"), dwsp__), symbol, Series(Text("="), dwsp__), component, ZeroOrMore(Series(Series(Text(","), dwsp__), component)), Lookahead(FOLLOW_UP), mandatory=1)
directive = Series( element.set(Alternative(Series(Option(retrieveop), symbol, NegativeLookahead(Retrieve(DEF))), literal, plaintext, regexp, char_range, Series(character, dwsp__), any_char, whitespace, group))
Series(Text("@"), dwsp__), symbol, Series(Text("="), dwsp__), countable.set(Alternative(option, oneormore, element))
Alternative(Series(component, ZeroOrMore(Series(Series(Text(","), dwsp__), component))), expression.set(Series(sequence, ZeroOrMore(Series(Retrieve(OR), dwsp__, sequence))))
expression),
Lookahead(FOLLOW_UP), mandatory=1)
definition = Series(symbol, Retrieve(DEF), dwsp__, Option(Series(Retrieve(OR), dwsp__)),
expression, Retrieve(ENDL), dwsp__, Lookahead(FOLLOW_UP), mandatory=1)
syntax = Series(dwsp__, ZeroOrMore(Alternative(definition, directive)), EOF) syntax = Series(dwsp__, ZeroOrMore(Alternative(definition, directive)), EOF)
resume_rules__ = {'definition': [re.compile(r'\n\s*(?=@|\w+\w*\s*=)')],
'directive': [re.compile(r'\n\s*(?=@|\w+\w*\s*=)')]}
root__ = syntax root__ = syntax
def __init__(self, root: Parser = None, static_analysis: Optional[bool] = None) -> None: def __init__(self, root: Parser = None, static_analysis: Optional[bool] = None) -> None:
...@@ -1968,21 +1913,18 @@ class FixedEBNFGrammar(Grammar): ...@@ -1968,21 +1913,18 @@ class FixedEBNFGrammar(Grammar):
@ comment = /(?!#x[A-Fa-f0-9])#.*(?:\n|$)|\/\*(?:.|\n)*?\*\/|\(\*(?:.|\n)*?\*\)/ @ comment = /(?!#x[A-Fa-f0-9])#.*(?:\n|$)|\/\*(?:.|\n)*?\*\/|\(\*(?:.|\n)*?\*\)/
# comments can be either C-Style: /* ... */ # comments can be either C-Style: /* ... */
# or pascal/modula/oberon-style: (* ... *) # or pascal/modula/oberon-style: (* ... *)
# or python-style: # ... \n, # or python-style: # ... \n, excluding, however, character markers: #x20
# excluding, however, character markers: #x20 @ whitespace = /\s*/ # whitespace includes linefeed
@ whitespace = /\s*/ # whitespace includes linefeeds
@ literalws = right # trailing whitespace of literals will be ignored tacitly @ literalws = right # trailing whitespace of literals will be ignored tacitly
@ disposable = component, pure_elem, countable, FOLLOW_UP, SYM_REGEX, ANY_SUFFIX, EOF @ disposable = component, pure_elem, countable, FOLLOW_UP, SYM_REGEX, ANY_SUFFIX, EOF
@ drop = whitespace, EOF # do not include these even in the concrete syntax tree @ drop = whitespace, EOF # do not include these even in the concrete syntax tree
@ RNG_BRACE_filter = matching_bracket() # filter or transform content of RNG_BRACE on retrieve @ RNG_BRACE_filter = matching_bracket() # filter or transform content of RNG_BRACE on retrieve
# re-entry-rules for resuming after parsing-error # re-entry-rules for resuming after parsing-error
@ definition_resume = /\n\s*(?=@|\w+\w*\s*=)/ @ definition_resume = /\n\s*(?=@|\w+\w*\s*=)/
@ directive_resume = /\n\s*(?=@|\w+\w*\s*=)/ @ directive_resume = /\n\s*(?=@|\w+\w*\s*=)/
# specialized error messages for certain cases # specialized error messages for certain cases
@ definition_error = /,/, 'Delimiter "," not expected in definition!\nEither this was meant to ' @ definition_error = /,/, 'Delimiter "," not expected in definition!\nEither this was meant to '
...@@ -1990,14 +1932,13 @@ class FixedEBNFGrammar(Grammar): ...@@ -1990,14 +1932,13 @@ class FixedEBNFGrammar(Grammar):
'due to inconsistent use of the comma as a delimiter\nfor the elements ' 'due to inconsistent use of the comma as a delimiter\nfor the elements '
'of a sequence.' 'of a sequence.'
#: top-level #: top-level
syntax = ~ { definition | directive } EOF syntax = ~ { definition | directive } EOF
definition = symbol §DEF~ [ OR~ ] expression ENDL~ & FOLLOW_UP # [OR~] to support v. Rossum's syntax definition = symbol §DEF~ [ OR~ ] expression ENDL~ & FOLLOW_UP # [OR~] to support v. Rossum's syntax
directive = "@" §symbol "=" ( component { "," component } | expression ) & FOLLOW_UP directive = "@" §symbol "=" component { "," component } & FOLLOW_UP
component = (regexp | literals | procedure | symbol !DEF) component = literals | procedure | expression
literals = { literal }+ # string chaining, only allowed in directives! literals = { literal }+ # string chaining, only allowed in directives!
procedure = SYM_REGEX "()" # procedure name, only allowed in directives! procedure = SYM_REGEX "()" # procedure name, only allowed in directives!
...@@ -2111,19 +2052,11 @@ class FixedEBNFGrammar(Grammar): ...@@ -2111,19 +2052,11 @@ class FixedEBNFGrammar(Grammar):
countable = Forward() countable = Forward()
element = Forward() element = Forward()
expression = Forward() expression = Forward()
source_hash__ = "d0735678e82e6d7cbf75958080a607ff" source_hash__ = "d39bd97362e79f1a15bdca37c067d78b"
disposable__ = re.compile('component$|pure_elem$|countable$|FOLLOW_UP$|SYM_REGEX$|ANY_SUFFIX$|EOF$') disposable__ = re.compile('component$|pure_elem$|countable$|FOLLOW_UP$|SYM_REGEX$|ANY_SUFFIX$|EOF$')
static_analysis_pending__ = [] # type: List[bool] static_analysis_pending__ = [] # type: List[bool]
parser_initialization__ = ["upon instantiation"] parser_initialization__ = ["upon instantiation"]
error_messages__ = { error_messages__ = {'definition': [(re.compile(r','), 'Delimiter "," not expected in definition!\\nEither this was meant to be a directive and the directive symbol @ is missing\\nor the error is due to inconsistent use of the comma as a delimiter\\nfor the elements of a sequence.')]}
'definition':
[(re.compile(r','),
'Delimiter "," not expected in definition!\\n'
'Either this was meant to be a directive and the directive symbol @ is missing\\n'
'or the error is due to inconsistent use of the comma as a delimiter\\n'
'for the elements of a sequence.')]}
resume_rules__ = {'definition': [re.compile(r'\n\s*(?=@|\w+\w*\s*=)')],
'directive': [re.compile(r'\n\s*(?=@|\w+\w*\s*=)')]}
COMMENT__ = r'(?!#x[A-Fa-f0-9])#.*(?:\n|$)|\/\*(?:.|\n)*?\*\/|\(\*(?:.|\n)*?\*\)' COMMENT__ = r'(?!#x[A-Fa-f0-9])#.*(?:\n|$)|\/\*(?:.|\n)*?\*\/|\(\*(?:.|\n)*?\*\)'
comment_rx__ = re.compile(COMMENT__) comment_rx__ = re.compile(COMMENT__)
WHITESPACE__ = r'\s*' WHITESPACE__ = r'\s*'
...@@ -2133,6 +2066,9 @@ class FixedEBNFGrammar(Grammar): ...@@ -2133,6 +2066,9 @@ class FixedEBNFGrammar(Grammar):
HEXCODE = RegExp('[A-Fa-f0-9]{1,8}') HEXCODE = RegExp('[A-Fa-f0-9]{1,8}')
SYM_REGEX = RegExp('(?!\\d)\\w+') SYM_REGEX = RegExp('(?!\\d)\\w+')
RE_CORE = RegExp('(?:(?<!\\\\)\\\\(?:/)|[^/])*') RE_CORE = RegExp('(?:(?<!\\\\)\\\\(?:/)|[^/])*')
regex_heuristics = Alternative(RegExp('[^ ]'), RegExp('[^/\\n*?+\\\\]*[*?+\\\\][^/\\n]/'))
literal_heuristics = Alternative(RegExp('~?\\s*"(?:[\\\\]\\]|[^\\]]|[^\\\\]\\[[^"]*)*"'), RegExp("~?\\s*'(?:[\\\\]\\]|[^\\]]|[^\\\\]\\[[^']*)*'"), RegExp('~?\\s*`(?:[\\\\]\\]|[^\\]]|[^\\\\]\\[[^`]*)*`'), RegExp('~?\\s*´(?:[\\\\]\\]|[^\\]]|[^\\\\]\\[[^´]*)*´'), RegExp('~?\\s*/(?:[\\\\]\\]|[^\\]]|[^\\\\]\\[[^/]*)*/'))
char_range_heuristics = NegativeLookahead(Alternative(RegExp('[\\n\\t ]'), Series(dwsp__, literal_heuristics), Series(Option(Alternative(Text("::"), Text(":?"), Text(":"))), SYM_REGEX, RegExp('\\s*\\]'))))
CH_LEADIN = Text("0x") CH_LEADIN = Text("0x")
RE_LEADOUT = Text("/") RE_LEADOUT = Text("/")
RE_LEADIN = Text("/") RE_LEADIN = Text("/")
...@@ -2144,69 +2080,45 @@ class FixedEBNFGrammar(Grammar): ...@@ -2144,69 +2080,45 @@ class FixedEBNFGrammar(Grammar):
AND = Text("") AND = Text("")
OR = Text("|") OR = Text("|")
DEF = Text("=") DEF = Text("=")
EOF = Drop(Drop(NegativeLookahead(RegExp('.')))) EOF = Drop(NegativeLookahead(RegExp('.')))
whitespace = Series(RegExp('~'), dwsp__) whitespace = Series(RegExp('~'), dwsp__)
any_char = Series(Text("."), dwsp__) any_char = Series(Text("."), dwsp__)
free_char = Alternative(RegExp('[^\\n\\[\\]\\\\]'), RegExp('\\\\[nrt`´\'"(){}\\[\\]/\\\\]'))
character = Series(CH_LEADIN, HEXCODE) character = Series(CH_LEADIN, HEXCODE)
char_range = Series(Text("["), Lookahead(char_range_heuristics), Option(Text("^")), Alternative(character, free_char), ZeroOrMore(Alternative(Series(Option(Text("-")), character), free_char)), Series(Text("]"), dwsp__))
regexp = Series(RE_LEADIN, RE_CORE, RE_LEADOUT, dwsp__) regexp = Series(RE_LEADIN, RE_CORE, RE_LEADOUT, dwsp__)
plaintext = Alternative(Series(RegExp('`(?:(?<!\\\\)\\\\`|[^`])*?`'), dwsp__), plaintext = Alternative(Series(RegExp('`(?:(?<!\\\\)\\\\`|[^`])*?`'), dwsp__), Series(RegExp('´(?:(?<!\\\\)\\\\´|[^´])*?´'), dwsp__))
Series(RegExp('´(?:(?<!\\\\)\\\\´|[^´])*?´'), dwsp__)) literal = Alternative(Series(RegExp('"(?:(?<!\\\\)\\\\"|[^"])*?"'), dwsp__), Series(RegExp("'(?:(?<!\\\\)\\\\'|[^'])*?'"), dwsp__))
literal = Alternative(Series(RegExp('"(?:(?<!\\\\)\\\\"|[^"])*?"'), dwsp__),
Series(RegExp("'(?:(?<!\\\\)\\\\'|[^'])*?'"), dwsp__))
symbol = Series(SYM_REGEX, dwsp__) symbol = Series(SYM_REGEX, dwsp__)
multiplier = Series(RegExp('[1-9]\\d*'), dwsp__) multiplier = Series(RegExp('[1-9]\\d*'), dwsp__)
no_range = Alternative(NegativeLookahead(multiplier), no_range = Alternative(NegativeLookahead(multiplier), Series(Lookahead(multiplier), TIMES))
Series(Lookahead(multiplier), TIMES)) range = Series(RNG_OPEN, dwsp__, multiplier, Option(Series(RNG_DELIM, dwsp__, multiplier)), RNG_CLOSE, dwsp__)
range = Series(RNG_OPEN, dwsp__, multiplier, Option(Series(RNG_DELIM, dwsp__, multiplier)), counted = Alternative(Series(countable, range), Series(countable, TIMES, dwsp__, multiplier), Series(multiplier, TIMES, dwsp__, countable, mandatory=3))
RNG_CLOSE, dwsp__) option = Alternative(Series(Series(Text("["), dwsp__), expression, Series(Text("]"), dwsp__), mandatory=1), Series(element, Series(Text("?"), dwsp__)))
counted = Alternative(Series(countable, range), Series(countable, TIMES, dwsp__, multiplier), repetition = Alternative(Series(Series(Text("{"), dwsp__), no_range, expression, Series(Text("}"), dwsp__), mandatory=2), Series(element, Series(Text("*"), dwsp__), no_range))
Series(multiplier, TIMES, dwsp__, countable, mandatory=3)) oneormore = Alternative(Series(Series(Text("{"), dwsp__), no_range, expression, Series(Text("}+"), dwsp__)), Series(element, Series(Text("+"), dwsp__)))
option = Alternative( group = Series(Series(Text("("), dwsp__), no_range, expression, Series(Text(")"), dwsp__), mandatory=2)
Series(Series(Text("["), dwsp__), expression, Series(Text("]"), dwsp__), mandatory=1), retrieveop = Alternative(Series(Text("::"), dwsp__), Series(Text(":?"), dwsp__), Series(Text(":"), dwsp__))
Series(element, Series(Text("?"), dwsp__))) flowmarker = Alternative(Series(Text("!"), dwsp__), Series(Text("&"), dwsp__), Series(Text("<-!"), dwsp__), Series(Text("<-&"), dwsp__))
repetition = Alternative(
Series(Series(Text("{"), dwsp__), no_range, expression,
Series(Text("}"), dwsp__), mandatory=2),
Series(element, Series(Text("*"), dwsp__), no_range))
oneormore = Alternative(
Series(Series(Text("{"), dwsp__), no_range, expression, Series(Text("}+"), dwsp__)),
Series(element, Series(Text("+"), dwsp__)))
group = Series(Series(Text("("), dwsp__), no_range, expression,
Series(Text(")"), dwsp__), mandatory=2)
retrieveop = Alternative(
Series(Text("::"), dwsp__), Series(Text(":?"), dwsp__), Series(Text(":"), dwsp__))
flowmarker = Alternative(
Series(Text("!"), dwsp__), Series(Text("&"), dwsp__),
Series(Text("<-!"), dwsp__), Series(Text("<-&"), dwsp__))
ANY_SUFFIX = RegExp('[?*+]') ANY_SUFFIX = RegExp('[?*+]')
element.set(Alternative( literals = OneOrMore(literal)
Series(Option(retrieveop), symbol, NegativeLookahead(DEF)),
literal, plaintext, regexp, Series(character, dwsp__), any_char, whitespace, group))
pure_elem = Series(element, NegativeLookahead(ANY_SUFFIX), mandatory=1) pure_elem = Series(element, NegativeLookahead(ANY_SUFFIX), mandatory=1)
countable.set(Alternative(option, oneormore, element)) procedure = Series(SYM_REGEX, Series(Text("()"), dwsp__))
term = Alternative(oneormore, counted, repetition, option, pure_elem) term = Alternative(oneormore, counted, repetition, option, pure_elem)
difference = Series(term, Option(Series( difference = Series(term, Option(Series(Series(Text("-"), dwsp__), Alternative(oneormore, pure_elem), mandatory=1)))
Series(Text("-"), dwsp__), Alternative(oneormore, pure_elem), mandatory=1)))
lookaround = Series(flowmarker, Alternative(oneormore, pure_elem), mandatory=1) lookaround = Series(flowmarker, Alternative(oneormore, pure_elem), mandatory=1)
interleave = Series(difference, ZeroOrMore( interleave = Series(difference, ZeroOrMore(Series(Series(Text("°"), dwsp__), Option(Series(Text("§"), dwsp__)), difference)))
Series(Series(Text("°"), dwsp__), Option(Series(Text("§"), dwsp__)), difference))) sequence = Series(Option(Series(Text("§"), dwsp__)), Alternative(interleave, lookaround), ZeroOrMore(Series(AND, dwsp__, Option(Series(Text("§"), dwsp__)), Alternative(interleave, lookaround))))
sequence = Series(
Option(Series(Text("§"), dwsp__)), Alternative(interleave, lookaround),
ZeroOrMore(Series(AND, dwsp__, Option(Series(Text("§"), dwsp__)),
Alternative(interleave, lookaround))))
expression.set(Series(sequence, ZeroOrMore(Series(OR, dwsp__, sequence))))
FOLLOW_UP = Alternative(Text("@"), symbol, EOF) FOLLOW_UP = Alternative(Text("@"), symbol, EOF)
procedure = Series(SYM_REGEX, Series(Text("()"), dwsp__)) definition = Series(symbol, DEF, dwsp__, Option(Series(OR, dwsp__)), expression, ENDL, dwsp__, Lookahead(FOLLOW_UP), mandatory=1)
literals = OneOrMore(literal) component = Alternative(literals, procedure, expression)
component = Alternative(regexp, literals, procedure, Series(symbol, NegativeLookahead(DEF))) directive = Series(Series(Text("@"), dwsp__), symbol, Series(Text("="), dwsp__), component, ZeroOrMore(Series(Series(Text(","), dwsp__), component)), Lookahead(FOLLOW_UP), mandatory=1)
directive = Series( element.set(Alternative(Series(Option(retrieveop), symbol, NegativeLookahead(DEF)), literal, plaintext, regexp, Series(character, dwsp__), any_char, whitespace, group))
Series(Text("@"), dwsp__), symbol, Series(Text("="), dwsp__), countable.set(Alternative(option, oneormore, element))
Alternative(Series(component, ZeroOrMore(Series(Series(Text(","), dwsp__), component))), expression.set(Series(sequence, ZeroOrMore(Series(OR, dwsp__, sequence))))
expression),
Lookahead(FOLLOW_UP), mandatory=1)
definition = Series(symbol, DEF, dwsp__, Option(Series(OR, dwsp__)), expression, ENDL, dwsp__,
Lookahead(FOLLOW_UP), mandatory=1)
syntax = Series(dwsp__, ZeroOrMore(Alternative(definition, directive)), EOF) syntax = Series(dwsp__, ZeroOrMore(Alternative(definition, directive)), EOF)
resume_rules__ = {'definition': [re.compile(r'\n\s*(?=@|\w+\w*\s*=)')],
'directive': [re.compile(r'\n\s*(?=@|\w+\w*\s*=)')]}
root__ = syntax root__ = syntax
......
# ebnf.py - EBNF -> Python-Parser compilation for DHParser
#
# Copyright 2016 by Eckhart Arnold (arnold@badw.de)
# Bavarian Academy of Sciences an Humanities (badw.de)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied. See the License for the specific language governing
# permissions and limitations under the License.
"""
Module ``ebnf`` provides an EBNF-parser-generator that compiles an
EBNF-Grammar into avPython-code that can be executed to parse source text
conforming to this grammar into concrete syntax trees.
Specifying Grammers with EBNF
-----------------------------
With DHParser, Grammars can be specified either directly in Python-code
(see :py:mod:`parse`) or in one of several EBNF-dialects. (Yes,
DHParser supports several different variants of EBNF! This makes it easy
to crate a parser directly from Grammars found in external sources.)
"EBNF" stands for the "Extended-Backus-Naur-Form" which is a common
formalism for specifying Grammars for context-free-languages.
(see https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form)
The recommended way of compiling grammars with DHParser is to either
write the EBNF-specification for that Grammar into a text-file and then
compile EBNF-source to an executable as well as importable Python-module
with the help of the "dhparser"-skript. Or, for bigger projects, to
create a new domain-specific-language-project with the DHParser-skript
as described in the step-by-step-guide.
However, here we will show how to compile an EBNF-specified grammar
from within Python-code and how to execute the parser that was
generated by compiling the grammar.
As an example, we will realize a json-parser (https://www.json.org/).
Let's start with creating some test-data::
>>> testobj = {'array': [1, 2.0, "a string"], 'number': -1.3e+25, 'bool': False}
>>> import json
>>> testdata = json.dumps(testobj)
>>> testdata
'{"array": [1, 2.0, "a string"], "number": -1.3e+25, "bool": false}'
We define the json-Grammar (see https://www.json.org/) in
top-down manner in EBNF. We'll use a regular-expression look-alike
syntax. EBNF, as you may recall, consists of a sequence of symbol
definitions. The definiens of those definitions either is a string
literal or regular expression or other symbols or a combination
of these with four different operators: 1. sequences
2. alternatives 3. options and 4. repetitions. Here is how these
elements are denoted in classical and regex-like EBNF-syntax:
======================== ================== ================
element classical EBNF regex-like
======================== ================== ================
insignificant whitespace ~ ~
string literal "..." or \\`...\\` "..." or \\`...\\`
regular expr. /.../ /.../
sequences A B C A B C
alternatives A | B | C A | B | C
options [ ... ] ...?
repetions { ... } ...*
one or more ...+
grouping (...) (...)
======================== ================== ================
"insignificant whitespace" is a speciality of DHParser. Denoting
insignificant whitespace with a particular sign ``~`` allows to eliminate
it already during the parsing process without burdening later
syntax-tree-processing stages with this common task. DHParser offers
several more facilities to restrain the verbosity of the concrete
syntax tree, so that the outcome of the parsing stage comes close (or
at least closer) to the intended abstract-syntax-tree, already.
JSON consists of two complex data types, 1) associative arrays,
called "object" and sequences of heterogeneous data, called array; and