Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
badw-it
DHParser
Commits
225f8299
Commit
225f8299
authored
Apr 10, 2018
by
di68kap
Browse files
Merge branch 'master' of
https://gitlab.lrz.de/badw-it/DHParser
parents
23ff2da2
5618278f
Changes
24
Expand all
Hide whitespace changes
Inline
Side-by-side
CONTRIBUTING.md
View file @
225f8299
...
@@ -9,7 +9,7 @@ The best (and easiest) way to contribute at this stage is to try to implement
...
@@ -9,7 +9,7 @@ The best (and easiest) way to contribute at this stage is to try to implement
a small DSL with DHParser and report bugs and problems and make suggestions
a small DSL with DHParser and report bugs and problems and make suggestions
for further development. Have a look at the README.md-file to get started.
for further development. Have a look at the README.md-file to get started.
Please the code from the git repository. Because code still changes quickly,
Please
, use
the code from the git repository. Because code still changes quickly,
any prepackaged builds may be outdated. The repository is here:
any prepackaged builds may be outdated. The repository is here:
https://gitlab.lrz.de/badw-it/DHParser
https://gitlab.lrz.de/badw-it/DHParser
...
@@ -25,8 +25,8 @@ bigger projects, below:
...
@@ -25,8 +25,8 @@ bigger projects, below:
Ideas for further development
Ideas for further development
=============================
=============================
Better error reporting
Better error reporting
I
----------------------
----------------------
--
A problem with error reporting consists in the fact that at best only the very
A problem with error reporting consists in the fact that at best only the very
first parsing error is reported accurately and then triggers a number of pure
first parsing error is reported accurately and then triggers a number of pure
...
@@ -49,10 +49,44 @@ left recursion stack, etc. without making the parser guard (see
...
@@ -49,10 +49,44 @@ left recursion stack, etc. without making the parser guard (see
Also, a good variety of test cases would be desirable.
Also, a good variety of test cases would be desirable.
Optimizations
Better error reporting II
-------------
-------------
------------
**Early discarding of nodes**
:
Yet another means to improve error reporting would be to supplement
the required operator "&" with an its forbidden operator, say "!&"
that would raise an error message, if some parser matches at a place
where it really shouldn't. [Add some examples here.]
Optimization and Enhancement: Two-way-Traversal for AST-Transformation
----------------------------------------------------------------------
AST-transformation are done via a depth-first tree-traversal, that is,
the traversal function first goes all the way up the tree to the leaf
nodes and calls the transformation routines successively on the way
down. The routines are picked from the transformation-table which is a
dictionary mapping Node's tag names to sequences of transformation functions.
The
rationale for depth-first is that it is easier to transform a node, if
all of its children have already been transformed, i.e. simplified.
However, there are quite a few cases where depth-last would be better.
For example if you know you are going to discard a whole branch starting
from a certain node, it is a waste to transform all the child nodes
first.
As the tree is traversed anyway, there no good reason why certain
transformation routines should not already be called on the way up.
Of course, as most routines
more or less assume depth first, we would need two transformation tables
one for the routines that are called on the way up. And one for the
routines that are called on the way down.
This should be fairly easy to implement.
Optimization: Early discarding of nodes
---------------------------------------
Reason:
`traverse_recursive`
and
`Node.result-setter`
are top time consumers!
Reason:
`traverse_recursive`
and
`Node.result-setter`
are top time consumers!
Allow to specify parsers/nodes, the result of which
Allow to specify parsers/nodes, the result of which
...
...
DHParser/dsl.py
View file @
225f8299
...
@@ -87,11 +87,11 @@ try:
...
@@ -87,11 +87,11 @@ try:
except ImportError:
except ImportError:
import re
import re
from DHParser import logging, is_filename, load_if_file,
\\
from DHParser import logging, is_filename, load_if_file,
\\
Grammar, Compiler, nil_preprocessor, PreprocessorToken,
\\
Grammar, Compiler, nil_preprocessor, PreprocessorToken,
Whitespace,
\\
Lookbehind, Lookahead, Alternative, Pop, Token, Synonym, AllOf, SomeOf, Unordered,
\\
Lookbehind, Lookahead, Alternative, Pop, Token, Synonym, AllOf, SomeOf, Unordered,
\\
Option, NegativeLookbehind, OneOrMore, RegExp, Retrieve, Series, RE, Capture,
\\
Option, NegativeLookbehind, OneOrMore, RegExp, Retrieve, Series, RE, Capture,
\\
ZeroOrMore, Forward, NegativeLookahead, mixin_comment, compile_source,
\\
ZeroOrMore, Forward, NegativeLookahead, mixin_comment, compile_source,
\\
last_value, counterpart, accumulate, PreprocessorFunc,
\\
grammar_changed,
last_value, counterpart, accumulate, PreprocessorFunc,
\\
Node, TransformationFunc, TransformationDict,
\\
Node, TransformationFunc, TransformationDict,
\\
traverse, remove_children_if, merge_children, is_anonymous,
\\
traverse, remove_children_if, merge_children, is_anonymous,
\\
reduce_single_child, replace_by_single_child, replace_or_reduce, remove_whitespace,
\\
reduce_single_child, replace_by_single_child, replace_or_reduce, remove_whitespace,
\\
...
@@ -120,6 +120,14 @@ def compile_src(source, log_dir=''):
...
@@ -120,6 +120,14 @@ def compile_src(source, log_dir=''):
if __name__ == "__main__":
if __name__ == "__main__":
if len(sys.argv) > 1:
if len(sys.argv) > 1:
try:
grammar_file_name = os.path.basename(__file__).replace('Compiler.py', '.ebnf')
if grammar_changed({NAME}Grammar, grammar_file_name):
print("Grammar has changed. Please recompile Grammar first.")
sys.exit(1)
except FileNotFoundError:
print('Could not check for changed grammar, because grammar file "%s" was not found!'
% grammar_file_name)
file_name, log_dir = sys.argv[1], ''
file_name, log_dir = sys.argv[1], ''
if file_name in ['-d', '--debug'] and len(sys.argv) > 2:
if file_name in ['-d', '--debug'] and len(sys.argv) > 2:
file_name, log_dir = sys.argv[2], 'LOGS'
file_name, log_dir = sys.argv[2], 'LOGS'
...
...
DHParser/ebnf.py
View file @
225f8299
...
@@ -30,7 +30,7 @@ from functools import partial
...
@@ -30,7 +30,7 @@ from functools import partial
from
DHParser.compile
import
CompilerError
,
Compiler
from
DHParser.compile
import
CompilerError
,
Compiler
from
DHParser.error
import
Error
from
DHParser.error
import
Error
from
DHParser.parse
import
Grammar
,
mixin_comment
,
Forward
,
RegExp
,
RE
,
\
from
DHParser.parse
import
Grammar
,
mixin_comment
,
Forward
,
RegExp
,
Whitespace
,
RE
,
\
NegativeLookahead
,
Alternative
,
Series
,
Option
,
OneOrMore
,
ZeroOrMore
,
Token
NegativeLookahead
,
Alternative
,
Series
,
Option
,
OneOrMore
,
ZeroOrMore
,
Token
from
DHParser.preprocess
import
nil_preprocessor
,
PreprocessorFunc
from
DHParser.preprocess
import
nil_preprocessor
,
PreprocessorFunc
from
DHParser.syntaxtree
import
Node
,
WHITESPACE_PTYPE
,
TOKEN_PTYPE
from
DHParser.syntaxtree
import
Node
,
WHITESPACE_PTYPE
,
TOKEN_PTYPE
...
@@ -77,56 +77,53 @@ def get_ebnf_preprocessor() -> PreprocessorFunc:
...
@@ -77,56 +77,53 @@ def get_ebnf_preprocessor() -> PreprocessorFunc:
class
EBNFGrammar
(
Grammar
):
class
EBNFGrammar
(
Grammar
):
r
"""
r
"""Parser for an EBNF source file, with this grammar:
Parser for an EBNF source file, with this grammar::
# EBNF-Grammar in EBNF
# EBNF-Grammar in EBNF
@ comment = /#.*(?:\n|$)/ # comments start with '#' and eat all chars up to and including '\n'
@ comment = /#.*(?:\n|$)/ # comments start with '#' and
@ whitespace = /\s*/ # whitespace includes linefeed
# eat all chars up to and including '\n'
@ literalws = right # trailing whitespace of literals will be ignored tacitly
@ whitespace = /\s*/ # whitespace includes linefeed
@ literalws = right # trailing whitespace of literals will be
syntax = [~//] { definition | directive } §EOF
# ignored tacitly
definition = symbol §"=" expression
directive = "@" §symbol "=" ( regexp | literal | list_ )
syntax = [~//] { definition | directive } §EOF
definition = symbol §"=" expression
expression = term { "|" term }
directive = "@" §symbol "=" ( regexp | literal | list_ )
term = { ["§"] factor }+ # "§" means all following factors mandatory
factor = [flowmarker] [retrieveop] symbol !"=" # negative lookahead to be sure it's not a definition
expression = term { "|" term }
| [flowmarker] literal
term = { ["§"] factor }+ # "§" means all following factors mandatory
| [flowmarker] plaintext
factor = [flowmarker] [retrieveop] symbol !"=" # negative lookahead to be sure
| [flowmarker] regexp
# it's not a definition
| [flowmarker] whitespace
| [flowmarker] literal
| [flowmarker] oneormore
| [flowmarker] regexp
| [flowmarker] group
| [flowmarker] oneormore
| [flowmarker] unordered
| [flowmarker] group
| repetition
| [flowmarker] unordered
| option
| repetition
| option
flowmarker = "!" | "&" # '!' negative lookahead, '&' positive lookahead
| "-!" | "-&" # '-' negative lookbehind, '-&' positive lookbehind
flowmarker = "!" | "&" # '!' negative lookahead, '&' positive lookahead
retrieveop = "::" | ":" # '::' pop, ':' retrieve
| "-!" | "-&" # '-' negative lookbehind, '-&' positive lookbehind
retrieveop = "::" | ":" # '::' pop, ':' retrieve
group = "(" §expression ")"
unordered = "<" §expression ">" # elements of expression in arbitrary order
group = "(" §expression ")"
oneormore = "{" expression "}+"
unordered = "<" §expression ">" # elements of expression in arbitrary order
repetition = "{" §expression "}"
oneormore = "{" expression "}+"
option = "[" §expression "]"
repetition = "{" §expression "}"
option = "[" §expression "]"
symbol = /(?!\d)\w+/~ # e.g. expression, factor, parameter_list
literal = /"(?:[^"]|\\")*?"/~ # e.g. "(", '+', 'while'
symbol = /(?!\d)\w+/~ # e.g. expression, factor, parameter_list
| /'(?:[^']|\\')*?'/~ # whitespace following literals will be ignored tacitly.
literal = /"(?:[^"]|\\")*?"/~ # e.g. "(", '+', 'while'
plaintext = /`(?:[^"]|\\")*?`/~ # like literal but does not eat whitespace
| /'(?:[^']|\\')*?'/~ # whitespace following literals will be ignored
regexp = /~?\/(?:\\\/|[^\/])*?\/~?/~ # e.g. /\w+/, ~/#.*(?:\n|$)/~
regexp = /~?\/(?:\\\/|[^\/])*?\/~?/~ # e.g. /\w+/, ~/#.*(?:\n|$)/~
# '~' is a whitespace-marker, if present leading or trailing
# '~' is a whitespace-marker, if present leading
# whitespace of a regular expression will be ignored tacitly.
# or trailing whitespace of a regular expression
whitespace = /~/~ # implicit or default whitespace
# will be ignored tacitly.
list_ = /\w+/~ { "," /\w+/~ } # comma separated list of symbols, e.g. BEGIN_LIST, END_LIST,
list_ = /\w+/~ { "," /\w+/~ } # comma separated list of symbols,
# BEGIN_QUOTE, END_QUOTE ; see CommonMark/markdown.py for an exmaple
# e.g. BEGIN_LIST, END_LIST,
EOF = !/./
# BEGIN_QUOTE, END_QUOTE
"""
# see CommonMark/markdown.py for an exmaple
EOF = !/./
"""
expression
=
Forward
()
expression
=
Forward
()
source_hash__
=
"3fc9f5a340f560e847d9af0b61a68743"
source_hash__
=
"3fc9f5a340f560e847d9af0b61a68743"
parser_initialization__
=
"upon instantiation"
parser_initialization__
=
"upon instantiation"
...
@@ -135,9 +132,12 @@ class EBNFGrammar(Grammar):
...
@@ -135,9 +132,12 @@ class EBNFGrammar(Grammar):
WSP__
=
mixin_comment
(
whitespace
=
WHITESPACE__
,
comment
=
COMMENT__
)
WSP__
=
mixin_comment
(
whitespace
=
WHITESPACE__
,
comment
=
COMMENT__
)
wspL__
=
''
wspL__
=
''
wspR__
=
WSP__
wspR__
=
WSP__
whitespace__
=
Whitespace
(
WSP__
)
EOF
=
NegativeLookahead
(
RegExp
(
'.'
))
EOF
=
NegativeLookahead
(
RegExp
(
'.'
))
list_
=
Series
(
RE
(
'
\\
w+'
),
ZeroOrMore
(
Series
(
Token
(
","
),
RE
(
'
\\
w+'
))))
list_
=
Series
(
RE
(
'
\\
w+'
),
ZeroOrMore
(
Series
(
Token
(
","
),
RE
(
'
\\
w+'
))))
whitespace
=
RE
(
'~'
)
regexp
=
RE
(
'~?/(?:
\\\\
/|[^/])*?/~?'
)
regexp
=
RE
(
'~?/(?:
\\\\
/|[^/])*?/~?'
)
plaintext
=
RE
(
'`(?:[^"]|
\\\\
")*?`'
)
literal
=
Alternative
(
RE
(
'"(?:[^"]|
\\\\
")*?"'
),
RE
(
"'(?:[^']|
\\\\
')*?'"
))
literal
=
Alternative
(
RE
(
'"(?:[^"]|
\\\\
")*?"'
),
RE
(
"'(?:[^']|
\\\\
')*?'"
))
symbol
=
RE
(
'(?!
\\
d)
\\
w+'
)
symbol
=
RE
(
'(?!
\\
d)
\\
w+'
)
option
=
Series
(
Token
(
"["
),
expression
,
Token
(
"]"
),
mandatory
=
1
)
option
=
Series
(
Token
(
"["
),
expression
,
Token
(
"]"
),
mandatory
=
1
)
...
@@ -147,18 +147,16 @@ class EBNFGrammar(Grammar):
...
@@ -147,18 +147,16 @@ class EBNFGrammar(Grammar):
group
=
Series
(
Token
(
"("
),
expression
,
Token
(
")"
),
mandatory
=
1
)
group
=
Series
(
Token
(
"("
),
expression
,
Token
(
")"
),
mandatory
=
1
)
retrieveop
=
Alternative
(
Token
(
"::"
),
Token
(
":"
))
retrieveop
=
Alternative
(
Token
(
"::"
),
Token
(
":"
))
flowmarker
=
Alternative
(
Token
(
"!"
),
Token
(
"&"
),
Token
(
"-!"
),
Token
(
"-&"
))
flowmarker
=
Alternative
(
Token
(
"!"
),
Token
(
"&"
),
Token
(
"-!"
),
Token
(
"-&"
))
factor
=
Alternative
(
factor
=
Alternative
(
Series
(
Option
(
flowmarker
),
Option
(
retrieveop
),
symbol
,
NegativeLookahead
(
Token
(
"="
))),
Series
(
Option
(
flowmarker
),
Option
(
retrieveop
),
symbol
,
NegativeLookahead
(
Token
(
"="
))
),
Series
(
Option
(
flowmarker
),
literal
),
Series
(
Option
(
flowmarker
),
plaintext
),
Series
(
Option
(
flowmarker
),
literal
),
Series
(
Option
(
flowmarker
),
regexp
),
Series
(
Option
(
flowmarker
),
regexp
),
Series
(
Option
(
flowmarker
),
whitespace
),
Series
(
Option
(
flowmarker
),
oneormore
),
Series
(
Option
(
flowmarker
),
group
),
Series
(
Option
(
flowmarker
),
oneormore
),
Series
(
Option
(
flowmarker
),
group
),
Series
(
Option
(
flowmarker
),
unordered
),
repetition
,
option
)
Series
(
Option
(
flowmarker
),
unordered
),
repetition
,
option
)
term
=
OneOrMore
(
Series
(
Option
(
Token
(
"§"
)),
factor
))
term
=
OneOrMore
(
Series
(
Option
(
Token
(
"§"
)),
factor
))
expression
.
set
(
Series
(
term
,
ZeroOrMore
(
Series
(
Token
(
"|"
),
term
))))
expression
.
set
(
Series
(
term
,
ZeroOrMore
(
Series
(
Token
(
"|"
),
term
))))
directive
=
Series
(
Token
(
"@"
),
symbol
,
Token
(
"="
),
Alternative
(
regexp
,
literal
,
list_
),
directive
=
Series
(
Token
(
"@"
),
symbol
,
Token
(
"="
),
Alternative
(
regexp
,
literal
,
list_
),
mandatory
=
1
)
mandatory
=
1
)
definition
=
Series
(
symbol
,
Token
(
"="
),
expression
,
mandatory
=
1
)
definition
=
Series
(
symbol
,
Token
(
"="
),
expression
,
mandatory
=
1
)
syntax
=
Series
(
Option
(
RE
(
''
,
wR
=
''
,
wL
=
WSP__
)),
ZeroOrMore
(
Alternative
(
definition
,
directive
)),
syntax
=
Series
(
Option
(
RE
(
''
,
wR
=
''
,
wL
=
WSP__
)),
ZeroOrMore
(
Alternative
(
definition
,
directive
)),
EOF
,
mandatory
=
2
)
EOF
,
mandatory
=
2
)
root__
=
syntax
root__
=
syntax
...
@@ -385,6 +383,7 @@ class EBNFCompiler(Compiler):
...
@@ -385,6 +383,7 @@ class EBNFCompiler(Compiler):
COMMENT_KEYWORD
=
"COMMENT__"
COMMENT_KEYWORD
=
"COMMENT__"
WHITESPACE_KEYWORD
=
"WSP__"
WHITESPACE_KEYWORD
=
"WSP__"
RAW_WS_KEYWORD
=
"WHITESPACE__"
RAW_WS_KEYWORD
=
"WHITESPACE__"
WHITESPACE_PARSER_KEYWORD
=
"whitespace__"
RESERVED_SYMBOLS
=
{
WHITESPACE_KEYWORD
,
RAW_WS_KEYWORD
,
COMMENT_KEYWORD
}
RESERVED_SYMBOLS
=
{
WHITESPACE_KEYWORD
,
RAW_WS_KEYWORD
,
COMMENT_KEYWORD
}
AST_ERROR
=
"Badly structured syntax tree. "
\
AST_ERROR
=
"Badly structured syntax tree. "
\
"Potentially due to erroneous AST transformation."
"Potentially due to erroneous AST transformation."
...
@@ -415,7 +414,7 @@ class EBNFCompiler(Compiler):
...
@@ -415,7 +414,7 @@ class EBNFCompiler(Compiler):
self
.
definitions
=
{}
# type: Dict[str, str]
self
.
definitions
=
{}
# type: Dict[str, str]
self
.
deferred_tasks
=
[]
# type: List[Callable]
self
.
deferred_tasks
=
[]
# type: List[Callable]
self
.
root_symbol
=
""
# type: str
self
.
root_symbol
=
""
# type: str
self
.
directives
=
{
'whitespace'
:
self
.
WHITESPACE
[
'
horizont
al'
],
self
.
directives
=
{
'whitespace'
:
self
.
WHITESPACE
[
'
vertic
al'
],
'comment'
:
''
,
'comment'
:
''
,
'literalws'
:
{
'right'
},
'literalws'
:
{
'right'
},
'tokens'
:
set
(),
# alt. 'preprocessor_tokens'
'tokens'
:
set
(),
# alt. 'preprocessor_tokens'
...
@@ -494,6 +493,12 @@ class EBNFCompiler(Compiler):
...
@@ -494,6 +493,12 @@ class EBNFCompiler(Compiler):
return
'
\n
'
.
join
(
compiler
)
return
'
\n
'
.
join
(
compiler
)
def
verify_transformation_table
(
self
,
transtable
):
def
verify_transformation_table
(
self
,
transtable
):
"""
Checks for symbols that occur in the transformation-table but have
never been defined in the grammar. Usually, this kind of
inconsistency results from an error like a typo in the transformation
table.
"""
assert
self
.
_dirty_flag
assert
self
.
_dirty_flag
table_entries
=
set
(
expand_table
(
transtable
).
keys
())
-
{
'*'
,
'+'
,
'~'
}
table_entries
=
set
(
expand_table
(
transtable
).
keys
())
-
{
'*'
,
'+'
,
'~'
}
symbols
=
self
.
rules
.
keys
()
symbols
=
self
.
rules
.
keys
()
...
@@ -528,6 +533,8 @@ class EBNFCompiler(Compiler):
...
@@ -528,6 +533,8 @@ class EBNFCompiler(Compiler):
# add special fields for Grammar class
# add special fields for Grammar class
definitions
.
append
((
self
.
WHITESPACE_PARSER_KEYWORD
,
'Whitespace(%s)'
%
self
.
WHITESPACE_KEYWORD
))
definitions
.
append
((
'wspR__'
,
self
.
WHITESPACE_KEYWORD
definitions
.
append
((
'wspR__'
,
self
.
WHITESPACE_KEYWORD
if
'right'
in
self
.
directives
[
'literalws'
]
else
"''"
))
if
'right'
in
self
.
directives
[
'literalws'
]
else
"''"
))
definitions
.
append
((
'wspL__'
,
self
.
WHITESPACE_KEYWORD
definitions
.
append
((
'wspL__'
,
self
.
WHITESPACE_KEYWORD
...
@@ -906,9 +913,13 @@ class EBNFCompiler(Compiler):
...
@@ -906,9 +913,13 @@ class EBNFCompiler(Compiler):
return
symbol
return
symbol
def
on_literal
(
self
,
node
)
->
str
:
def
on_literal
(
self
,
node
:
Node
)
->
str
:
return
'Token('
+
node
.
content
.
replace
(
'
\\
'
,
r
'\\'
)
+
')'
# return 'Token(' + ',
return
'Token('
+
node
.
content
.
replace
(
'
\\
'
,
r
'\\'
)
+
')'
# '.merge_children([node.result]) + ')' ?
def
on_plaintext
(
self
,
node
:
Node
)
->
str
:
return
'Token('
+
node
.
content
.
replace
(
'
\\
'
,
r
'\\'
).
replace
(
'`'
,
'"'
)
\
+
", wL='', wR='')"
def
on_regexp
(
self
,
node
:
Node
)
->
str
:
def
on_regexp
(
self
,
node
:
Node
)
->
str
:
...
@@ -942,6 +953,10 @@ class EBNFCompiler(Compiler):
...
@@ -942,6 +953,10 @@ class EBNFCompiler(Compiler):
return
parser
+
', '
.
join
([
arg
]
+
name
)
+
')'
return
parser
+
', '
.
join
([
arg
]
+
name
)
+
')'
def
on_whitespace
(
self
,
node
:
Node
)
->
str
:
return
'whitespace__'
def
on_list_
(
self
,
node
)
->
Set
[
str
]:
def
on_list_
(
self
,
node
)
->
Set
[
str
]:
assert
node
.
children
assert
node
.
children
return
set
(
item
.
result
.
strip
()
for
item
in
node
.
children
)
return
set
(
item
.
result
.
strip
()
for
item
in
node
.
children
)
...
...
DHParser/log.py
View file @
225f8299
...
@@ -203,10 +203,10 @@ class HistoryRecord:
...
@@ -203,10 +203,10 @@ class HistoryRecord:
FAIL
=
"FAIL"
FAIL
=
"FAIL"
Snapshot
=
collections
.
namedtuple
(
'Snapshot'
,
[
'line'
,
'column'
,
'stack'
,
'status'
,
'text'
])
Snapshot
=
collections
.
namedtuple
(
'Snapshot'
,
[
'line'
,
'column'
,
'stack'
,
'status'
,
'text'
])
COLGROUP
=
'<colgroup>
\n
<col style="width:2%"/><col style="width:2%"/><col
style="width:75"/>
'
\
COLGROUP
=
'<colgroup>
\n
<col style="width:2%"/><col style="width:2%"/><col '
\
'<col style="width:6%"/><col style="width:15%"/>
\n
</colgroup>'
'
style="width:75%"/>
<col style="width:6%"/><col style="width:15%"/>
\n
</colgroup>'
HEADINGS
=
(
'<tr><th>L</th><th>C</th><th>parser call
ing
sequence</th>'
HEADINGS
=
(
'<tr><th>L</th><th>C</th><th>parser call sequence</th>'
'<th>success</th><th>text
to parse
</th></tr>'
)
'<th>success</th><th>text
matched or failed
</th></tr>'
)
HTML_LEAD_IN
=
(
'<!DOCTYPE html>
\n
'
HTML_LEAD_IN
=
(
'<!DOCTYPE html>
\n
'
'<html>
\n
<head>
\n
<meta charset="utf-8"/>
\n
<style>
\n
'
'<html>
\n
<head>
\n
<meta charset="utf-8"/>
\n
<style>
\n
'
'td,th {font-family:monospace; '
'td,th {font-family:monospace; '
...
@@ -289,7 +289,7 @@ class HistoryRecord:
...
@@ -289,7 +289,7 @@ class HistoryRecord:
@
property
@
property
def
stack
(
self
)
->
str
:
def
stack
(
self
)
->
str
:
return
"->"
.
join
((
p
.
repr
if
p
.
ptype
==
':RegExp'
else
p
.
name
or
p
.
ptype
)
return
"->"
.
join
((
p
.
repr
if
p
.
ptype
in
{
':RegExp'
,
':PlainText'
}
else
p
.
name
or
p
.
ptype
)
for
p
in
self
.
call_stack
)
for
p
in
self
.
call_stack
)
@
property
@
property
...
...
DHParser/parse.py
View file @
225f8299
...
@@ -48,6 +48,7 @@ __all__ = ('Parser',
...
@@ -48,6 +48,7 @@ __all__ = ('Parser',
'Grammar'
,
'Grammar'
,
'PreprocessorToken'
,
'PreprocessorToken'
,
'RegExp'
,
'RegExp'
,
'Whitespace'
,
'RE'
,
'RE'
,
'Token'
,
'Token'
,
'mixin_comment'
,
'mixin_comment'
,
...
@@ -445,16 +446,18 @@ class Grammar:
...
@@ -445,16 +446,18 @@ class Grammar:
history_tracking__: A flag indicating that the parsing history shall
history_tracking__: A flag indicating that the parsing history shall
be tracked
be tracked
wsp_left_parser__: A parser for the default left-adjacent-whitespace
whitespace__: A parser for the implicit optional whitespace (or the
or the :class:zombie-parser if the
:class:zombie-parser if the default is empty). The default
default is empty. The default whitespace will be used by parsers
whitespace will be used by parsers :class:`Token` and, if no
:class:`Token` and, if no other parsers are passed to its constructor,
other parsers are passed to its constructor, by parser
by parser :class:`RE`.
:class:`RE`. It can also be place explicitly in the
EBNF-Grammar via the "~"-sign.
wsp_right_parser__: The same for the default right-adjacent-whitespace.
wsp_left_parser__: The same as ``whitespace`` for
Both wsp_left_parser__ and wsp_right_parser__ merely serve the
left-adjacent-whitespace.
purpose to avoid having to specify the default whitespace
explicitly every time an :class:`RE`-parser-object is created.
wsp_right_parser__: The same as ``whitespace`` for
right-adjacent-whitespace.
_dirty_flag__: A flag indicating that the Grammar has been called at
_dirty_flag__: A flag indicating that the Grammar has been called at
least once so that the parsing-variables need to be reset
least once so that the parsing-variables need to be reset
...
@@ -591,18 +594,22 @@ class Grammar:
...
@@ -591,18 +594,22 @@ class Grammar:
# do so only arises during testing.
# do so only arises during testing.
self
.
root__
=
copy
.
deepcopy
(
root
)
if
root
else
copy
.
deepcopy
(
self
.
__class__
.
root__
)
self
.
root__
=
copy
.
deepcopy
(
root
)
if
root
else
copy
.
deepcopy
(
self
.
__class__
.
root__
)
if
self
.
wspL__
:
if
self
.
WSP__
:
self
.
wsp_left_parser__
=
Whitespace
(
self
.
wspL__
)
# type: ParserBase
try
:
self
.
wsp_left_parser__
.
grammar
=
self
probe
=
self
.
whitespace__
self
.
all_parsers__
.
add
(
self
.
wsp_left_parser__
)
# don't you forget about me...
assert
self
.
whitespace__
.
regexp
.
pattern
==
self
.
WSP__
else
:
except
AttributeError
:
self
.
wsp_left_parser__
=
ZOMBIE_PARSER
self
.
whitespace__
=
Whitespace
(
self
.
WSP__
)
if
self
.
wspR__
:
self
.
whitespace__
.
grammar
=
self
self
.
wsp_right_parser__
=
Whitespace
(
self
.
wspR__
)
# type: ParserBase
self
.
all_parsers__
.
add
(
self
.
whitespace__
)
# don't you forget about me...
self
.
wsp_right_parser__
.
grammar
=
self
self
.
all_parsers__
.
add
(
self
.
wsp_right_parser__
)
# don't you forget about me...
else
:
else
:
self
.
wsp_right_parser__
=
ZOMBIE_PARSER
self
.
whitespace__
=
ZOMBIE_PARSER
assert
not
self
.
wspL__
or
self
.
wspL__
==
self
.
WSP__
assert
not
self
.
wspR__
or
self
.
wspR__
==
self
.
WSP__
self
.
wsp_left_parser__
=
self
.
whitespace__
if
self
.
wspL__
else
ZOMBIE_PARSER
self
.
wsp_right_parser__
=
self
.
whitespace__
if
self
.
wspR__
else
ZOMBIE_PARSER
self
.
root__
.
apply
(
self
.
_add_parser__
)
self
.
root__
.
apply
(
self
.
_add_parser__
)
...
@@ -884,6 +891,9 @@ class PlainText(Parser):
...
@@ -884,6 +891,9 @@ class PlainText(Parser):
return
Node
(
self
,
self
.
text
,
True
),
text
[
self
.
len
:]
return
Node
(
self
,
self
.
text
,
True
),
text
[
self
.
len
:]
return
None
,
text
return
None
,
text
def
__repr__
(
self
):
return
(
"'%s'"
if
self
.
text
.
find
(
"'"
)
<=
0
else
'"%s"'
)
%
self
.
text
class
RegExp
(
Parser
):
class
RegExp
(
Parser
):
r
"""
r
"""
...
@@ -941,6 +951,28 @@ class Whitespace(RegExp):
...
@@ -941,6 +951,28 @@ class Whitespace(RegExp):
assert
WHITESPACE_PTYPE
==
":Whitespace"
assert
WHITESPACE_PTYPE
==
":Whitespace"
#######################################################################
#######################################################################
#
# WARNING: The following code is hard to maintain, because it
# introduces a special case, i.e. a parser with child parsers that is
# not a descandent of the NaryOperator and, because it itneracts
# With the constructor of the Grammar class (see the instantiations of
# the Whitespace-class, there).
#
# That is all the more regrettable, as class RE basically just
# introduces syntactical sugar for
#
# Series(whitespace__, RegExp('something'), whitespace__)
#
# What to do? Throw the syntactical sugar out? :-( Or find a more
# robust solution for that kind of syntactical sugar? Or just leave
# it be?
#
######################################################################
######################################################################
class
RE
(
Parser
):
class
RE
(
Parser
):
r
"""
r
"""
Regular Expressions with optional leading or trailing whitespace.
Regular Expressions with optional leading or trailing whitespace.
...
@@ -982,9 +1014,8 @@ class RE(Parser):
...
@@ -982,9 +1014,8 @@ class RE(Parser):
wL (str or regexp): Left whitespace regular expression,
wL (str or regexp): Left whitespace regular expression,
i.e. either ``None``, the empty string or a regular
i.e. either ``None``, the empty string or a regular
expression (e.g. "\s*") that defines whitespace. An
expression (e.g. "\s*") that defines whitespace. An
empty string means no whitespace will be skipped,
empty string means no whitespace will be skipped; ``None``
``None`` means that the default whitespace will be
means that the default whitespace will be used.
used.
wR (str or regexp): Right whitespace regular expression.
wR (str or regexp): Right whitespace regular expression.
See above.
See above.
name: The optional name of the parser.
name: The optional name of the parser.
...
...
DHParser/syntaxtree.py
View file @
225f8299
...
@@ -39,6 +39,7 @@ __all__ = ('ParserBase',
...
@@ -39,6 +39,7 @@ __all__ = ('ParserBase',
'MockParser'
,
'MockParser'
,
'ZombieParser'
,
'ZombieParser'
,
'ZOMBIE_PARSER'
,
'ZOMBIE_PARSER'
,
'ZOMBIE_NODE'
,
'Node'
,
'Node'
,
'mock_syntax_tree'
,
'mock_syntax_tree'
,
'flatten_sxpr'
)
'flatten_sxpr'
)
...
@@ -724,6 +725,9 @@ class Node(collections.abc.Sized):
...
@@ -724,6 +725,9 @@ class Node(collections.abc.Sized):
return
sum
(
child
.
tree_size
()
for
child
in
self
.
children
)
+
1
return
sum
(
child
.
tree_size
()
for
child
in
self
.
children
)
+
1
ZOMBIE_NODE
=
Node
(
ZOMBIE_PARSER
,
''
)
def
mock_syntax_tree
(
sxpr
):
def
mock_syntax_tree
(
sxpr
):
"""
"""
Generates a tree of nodes from an S-expression. The main purpose of this is
Generates a tree of nodes from an S-expression. The main purpose of this is
...
...
DHParser/transform.py
View file @
225f8299
...
@@ -30,7 +30,7 @@ for CST -> AST transformations.
...
@@ -30,7 +30,7 @@ for CST -> AST transformations.
import
inspect
import
inspect
from
functools
import
partial
,
reduce
,
singledispatch
from
functools
import
partial
,
reduce
,
singledispatch
from
DHParser.syntaxtree
import
Node
,
WHITESPACE_PTYPE
,
TOKEN_PTYPE
,
MockParser
from
DHParser.syntaxtree
import
Node
,
WHITESPACE_PTYPE
,
TOKEN_PTYPE
,
MockParser
,
ZOMBIE_NODE
from
DHParser.toolkit
import
expand_table
,
smart_list
,
re
,
typing
from
DHParser.toolkit
import
expand_table
,
smart_list
,
re
,
typing
from
typing
import
AbstractSet
,
Any
,
ByteString
,
Callable
,
cast
,
Container
,
Dict
,
\
from
typing
import
AbstractSet
,
Any
,
ByteString
,
Callable
,
cast
,
Container
,
Dict
,
\
List
,
Sequence
,
Union
,
Text
List
,
Sequence
,
Union
,
Text
...
@@ -108,7 +108,7 @@ def transformation_factory(t1=None, t2=None, t3=None, t4=None, t5=None):
...
@@ -108,7 +108,7 @@ def transformation_factory(t1=None, t2=None, t3=None, t4=None, t5=None):
dispatch on the first parameter after the context parameter.
dispatch on the first parameter after the context parameter.
Decorating a transformation-function that has more than merely the
Decorating a transformation-function that has more than merely the
``
node
``-parameter with ``transformation_factory`` creates a
``
context
``-parameter with ``transformation_factory`` creates a
function with the same name, which returns a partial-function that
function with the same name, which returns a partial-function that
takes just the context-parameter.
takes just the context-parameter.
...
@@ -158,7 +158,7 @@ def transformation_factory(t1=None, t2=None, t3=None, t4=None, t5=None):
...
@@ -158,7 +158,7 @@ def transformation_factory(t1=None, t2=None, t3=None, t4=None, t5=None):
f
=
singledispatch
(
f
)
f
=
singledispatch
(
f
)
try
:
try
:
if
len
(
params
)
==
1
and
issubclass
(
p1type
,
Container
)
\