Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
badw-it
DHParser
Commits
c493a67b
Commit
c493a67b
authored
Oct 10, 2018
by
Eckhart Arnold
Browse files
Bereinigungen
parent
adf2be35
Changes
21
Expand all
Hide whitespace changes
Inline
Side-by-side
.gitignore
View file @
c493a67b
...
...
@@ -44,3 +44,4 @@ _static
_templates
.vs
OLDSTUFF
.pytest_cache
\ No newline at end of file
Introduction.md
View file @
c493a67b
...
...
@@ -386,7 +386,7 @@ scroll down to the AST section, you'll see something like this:
"ZEICHENFOLGE, NZ, JAHRESZAHL": content_from_sinlge_child,
"WORT, NAME, LEERZEILE, ENDE": [],
":Whitespace": replace_content(lambda node : " "),
":Token
, :RE
": content_from_sinlge_child,
":Token": content_from_sinlge_child,
"*": replace_by_single_child
}
...
...
build_cython-modules.sh
View file @
c493a67b
#!/bin/sh
# CFLAGS="-O3 -march=native -mtune=native"
python3
7
setup.py build_ext
--inplace
python3 setup.py build_ext
--inplace
documentation/StepByStepGuide.rst
View file @
c493a67b
...
...
@@ -637,7 +637,7 @@ can easily write your own. How does this look like? ::
"part": [],
"WORD": [],
"EOF": [],
":
_
Token
, :_RE
": reduce_single_child,
":Token": reduce_single_child,
"*": replace_by_single_child
}
...
...
@@ -653,8 +653,8 @@ reached the transformations for its descendant nodes have already been applied.
As you can see, the transformation-table contains an entry for every known
parser, i.e. "document", "sentence", "part", "WORD", "EOF". (If any of these are
missing in the table of your ``poetryCompiler.py``, add them now!) In the
template you'll also find transformations for t
wo
anonymous parser
s, i.e.
":
_
Token"
and ":_RE"
as well as some curious entries such as "*" and "+". The
template you'll also find transformations for t
he
anonymous parser
":Token" as well as some curious entries such as "*" and "+". The
latter are considered to be "jokers". The transformations related to the
"+"-sign will be applied on any node, before any other transformation is
applied. In this case, all empty nodes will be removed first (transformation:
...
...
@@ -722,10 +722,10 @@ Running the "poetryCompiler.py"-script on "macbeth.dsl" again, yields::
<WORD>shadow</WORD>
</part>
<:Series>
<:
_
Token>
<:Token>
<:PlainText>,</:PlainText>
<:Whitespace> </:Whitespace>
</:
_
Token>
</:Token>
<part>
<WORD>a</WORD>
...
...
...
@@ -734,11 +734,10 @@ It starts to become more readable and concise, but there are sill some oddities.
Firstly, the Tokens that deliminate parts of sentences still contain whitespace.
Secondly, if several <part>-nodes follow each other in a <sentence>-node, the
<part>-nodes after the first one are enclosed by a <:Series>-node or even a
cascade of <:ZeroOrMore> and <:Series>-nodes. As for the <:
_
Token>-nodes, we
cascade of <:ZeroOrMore> and <:Series>-nodes. As for the <:Token>-nodes, we
can do the same trick as with the WORD-nodes::
":_Token": [remove_whitespace, reduce_single_child],
":_RE": reduce_single_child,
":Token": [remove_whitespace, reduce_single_child],
As to the nested structure of the <part>-nodes within the <sentence>-node, this
a rather typical case of syntactic artifacts that can be found in concrete
...
...
@@ -807,7 +806,7 @@ Now that everything is set, let's have a look at the result::
<WORD>walking</WORD>
<WORD>shadow</WORD>
</part>
<:
_
Token>,</:
_
Token>
<:Token>,</:Token>
<part>
<WORD>a</WORD>
<WORD>poor</WORD>
...
...
@@ -816,8 +815,8 @@ Now that everything is set, let's have a look at the result::
That is much better. There is but one slight blemish in the output: While all
nodes left a named nodes, i.e. nodes associated with a named parser, there are a
few anonymous <:
_
Token> nodes. Here is a little exercise: Do away with those
<:
_
Token>-nodes by replacing them by something semantically more meaningful.
few anonymous <:Token> nodes. Here is a little exercise: Do away with those
<:Token>-nodes by replacing them by something semantically more meaningful.
Hint: Add a new symbol "delimiter" in the grammar definition "poetry.ebnf". An
alternative strategy to extending the grammar would be to use the
``replace_parser`` operator. Which of the strategy is the better one? Explain
...
...
experimental/fascitergula_alternative.xml
deleted
100644 → 0
View file @
adf2be35
This diff is collapsed.
Click to expand it.
experimental/new2/README.md
deleted
100644 → 0
View file @
adf2be35
# new2
PLACE A SHORT DESCRIPTION HERE
Author: AUTHOR'S NAME
<EMAIL>
, AFFILIATION
## License
new2 is open source software under the
[
Apache 2.0 License
](
https://www.apache.org/licenses/LICENSE-2.0
)
Copyright YEAR AUTHOR'S NAME
<EMAIL>
, AFFILIATION
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
experimental/new2/example.dsl
deleted
100644 → 0
View file @
adf2be35
Life is but a walking shadow.
experimental/new2/grammar_tests/01_test_word.ini
deleted
100644 → 0
View file @
adf2be35
[match:WORD]
M1:
word
M2:
one_word_with_underscores
[match:WORD]
M3:
Life’s
[fail:WORD]
F1:
two
words
F2:
""
experimental/new2/grammar_tests/02_test_document.ini
deleted
100644 → 0
View file @
adf2be35
[match:document]
M1:
"""This
is
a
sequence
of
words
extending
over
several
lines."""
M2:
"""
This
sequence
contains
leading
whitespace."""
[fail:document]
F1:
"""This
test
should
fail,
because
neither
comma
nor
full
have
been
defined
anywhere"""
experimental/new2/grammar_tests/03_test_sentence.ini
deleted
100644 → 0
View file @
adf2be35
[match:part]
M1:
"""a
poor
player
that
struts
and
frets
his
hour
upon
the
stage"""
[fail:part]
F1:
"""It
is
a
tale
told
by
an
idiot,"""
[match:sentence]
M1:
"""It
is
a
tale
told
by
an
idiot,
full
of
sound
and
fury,
signifying
nothing."""
M2:
"""Plain
old
sentence."""
[fail:sentence]
F1:
"""Ups,
a
full
stop
is
missing"""
F2:
"""No
commas
at
the
end,."""
experimental/new2/macbeth.dsl
deleted
100644 → 0
View file @
adf2be35
Life’s but a walking shadow, a poor player that struts and frets his hour
upon the stage and then is heard no more. It is a tale told by an idiot,
full of sound and fury, signifying nothing.
experimental/new2/new2.ebnf
deleted
100644 → 0
View file @
adf2be35
document = ~ { sentence } §EOF
sentence = part {"," part } "."
part = { WORD }+
WORD = /[\w’]+/~
EOF = !/./
experimental/new2/new2Compiler.py
deleted
100755 → 0
View file @
adf2be35
#!/usr/bin/python
#######################################################################
#
# SYMBOLS SECTION - Can be edited. Changes will be preserved.
#
#######################################################################
from
functools
import
partial
import
os
import
sys
sys
.
path
.
append
(
r
'/home/eckhart/Entwicklung/DHParser'
)
try
:
import
regex
as
re
except
ImportError
:
import
re
from
DHParser
import
logging
,
is_filename
,
load_if_file
,
\
Grammar
,
Compiler
,
nil_preprocessor
,
PreprocessorToken
,
Whitespace
,
\
Lookbehind
,
Lookahead
,
Alternative
,
Pop
,
_Token
,
Synonym
,
AllOf
,
SomeOf
,
Unordered
,
\
Option
,
NegativeLookbehind
,
OneOrMore
,
RegExp
,
Retrieve
,
Series
,
_RE
,
Capture
,
\
ZeroOrMore
,
Forward
,
NegativeLookahead
,
mixin_comment
,
compile_source
,
\
last_value
,
counterpart
,
accumulate
,
PreprocessorFunc
,
\
Node
,
TransformationFunc
,
TransformationDict
,
\
traverse
,
remove_children_if
,
merge_children
,
is_anonymous
,
\
reduce_single_child
,
replace_by_single_child
,
replace_or_reduce
,
remove_whitespace
,
\
remove_expendables
,
remove_empty
,
remove_tokens
,
flatten
,
is_whitespace
,
\
is_empty
,
is_expendable
,
collapse
,
replace_content
,
WHITESPACE_PTYPE
,
TOKEN_PTYPE
,
\
remove_nodes
,
remove_content
,
remove_brackets
,
replace_parser
,
\
keep_children
,
is_one_of
,
has_content
,
apply_if
,
remove_first
,
remove_last
,
\
remove_anonymous_empty
,
keep_nodes
,
traverse_locally
,
strip
,
lstrip
,
rstrip
,
\
grammar_changed
#######################################################################
#
# PREPROCESSOR SECTION - Can be edited. Changes will be preserved.
#
#######################################################################
def
new2Preprocessor
(
text
):
return
text
,
lambda
i
:
i
def
get_preprocessor
()
->
PreprocessorFunc
:
return
new2Preprocessor
#######################################################################
#
# PARSER SECTION - Don't edit! CHANGES WILL BE OVERWRITTEN!
#
#######################################################################
class
new2Grammar
(
Grammar
):
r
"""Parser for a new2 source file, with this grammar:
document = ~ { sentence } §EOF
sentence = part {"," part } "."
part = { WORD }+
WORD = /[\w’]+/~
EOF = !/./
"""
source_hash__
=
"7a9984368b1c959222099d389d18c54f"
parser_initialization__
=
"upon instantiation"
COMMENT__
=
r
''
WHITESPACE__
=
r
'\s*'
WSP_RE__
=
mixin_comment
(
whitespace
=
WHITESPACE__
,
comment
=
COMMENT__
)
wspL__
=
''
wspR__
=
WSP__
whitespace__
=
Whitespace
(
WSP__
)
EOF
=
NegativeLookahead
(
RegExp
(
'.'
))
WORD
=
_RE
(
'[
\\
w’]+'
)
part
=
OneOrMore
(
WORD
)
sentence
=
Series
(
part
,
ZeroOrMore
(
Series
(
_Token
(
","
),
part
)),
_Token
(
"."
))
document
=
Series
(
whitespace__
,
ZeroOrMore
(
sentence
),
EOF
,
mandatory
=
2
)
root__
=
document
def
get_grammar
()
->
new2Grammar
:
global
thread_local_new2_grammar_singleton
try
:
grammar
=
thread_local_new2_grammar_singleton
except
NameError
:
thread_local_new2_grammar_singleton
=
new2Grammar
()
grammar
=
thread_local_new2_grammar_singleton
return
grammar
#######################################################################
#
# AST SECTION - Can be edited. Changes will be preserved.
#
#######################################################################
new2_AST_transformation_table
=
{
# AST Transformations for the new2-grammar
"+"
:
remove_empty
,
"document"
:
[
remove_whitespace
,
reduce_single_child
],
"sentence"
:
[
flatten
],
"part"
:
[],
"WORD"
:
[
remove_whitespace
,
reduce_single_child
],
"EOF"
:
[],
":_Token"
:
[
remove_whitespace
,
reduce_single_child
],
":_RE"
:
reduce_single_child
,
"*"
:
replace_by_single_child
}
def
new2Transform
()
->
TransformationDict
:
return
partial
(
traverse
,
processing_table
=
new2_AST_transformation_table
.
copy
())
def
get_transformer
()
->
TransformationFunc
:
global
thread_local_new2_transformer_singleton
try
:
transformer
=
thread_local_new2_transformer_singleton
except
NameError
:
thread_local_new2_transformer_singleton
=
new2Transform
()
transformer
=
thread_local_new2_transformer_singleton
return
transformer
#######################################################################
#
# COMPILER SECTION - Can be edited. Changes will be preserved.
#
#######################################################################
class
new2Compiler
(
Compiler
):
"""Compiler for the abstract-syntax-tree of a new2 source file.
"""
def
__init__
(
self
,
grammar_name
=
"new2"
,
grammar_source
=
""
):
super
(
new2Compiler
,
self
).
__init__
(
grammar_name
,
grammar_source
)
assert
re
.
match
(
'\w+\Z'
,
grammar_name
)
def
on_document
(
self
,
node
):
return
self
.
fallback_compiler
(
node
)
# def on_WORD(self, node):
# return node
# def on_EOF(self, node):
# return node
def
get_compiler
(
grammar_name
=
"new2"
,
grammar_source
=
""
)
->
new2Compiler
:
global
thread_local_new2_compiler_singleton
try
:
compiler
=
thread_local_new2_compiler_singleton
compiler
.
set_grammar_name
(
grammar_name
,
grammar_source
)
except
NameError
:
thread_local_new2_compiler_singleton
=
\
new2Compiler
(
grammar_name
,
grammar_source
)
compiler
=
thread_local_new2_compiler_singleton
return
compiler
#######################################################################
#
# END OF DHPARSER-SECTIONS
#
#######################################################################
def
compile_src
(
source
,
log_dir
=
''
):
"""Compiles ``source`` and returns (result, errors, ast).
"""
with
logging
(
log_dir
):
compiler
=
get_compiler
()
cname
=
compiler
.
__class__
.
__name__
log_file_name
=
os
.
path
.
basename
(
os
.
path
.
splitext
(
source
)[
0
])
\
if
is_filename
(
source
)
<
0
else
cname
[:
cname
.
find
(
'.'
)]
+
'_out'
result
=
compile_source
(
source
,
get_preprocessor
(),
get_grammar
(),
get_transformer
(),
compiler
)
return
result
if
__name__
==
"__main__"
:
if
len
(
sys
.
argv
)
>
1
:
try
:
grammar_file_name
=
os
.
path
.
basename
(
__file__
).
replace
(
'Compiler.py'
,
'.ebnf'
)
if
grammar_changed
(
new2Grammar
,
grammar_file_name
):
print
(
"Grammar has changed. Please recompile Grammar first."
)
sys
.
exit
(
1
)
except
FileNotFoundError
:
print
(
'Could not check for changed grammar, because grammar file "%s" was not found!'
%
grammar_file_name
)
file_name
,
log_dir
=
sys
.
argv
[
1
],
''
if
file_name
in
[
'-d'
,
'--debug'
]
and
len
(
sys
.
argv
)
>
2
:
file_name
,
log_dir
=
sys
.
argv
[
2
],
'LOGS'
result
,
errors
,
ast
=
compile_src
(
file_name
,
log_dir
)
if
errors
:
cwd
=
os
.
getcwd
()
rel_path
=
file_name
[
len
(
cwd
):]
if
file_name
.
startswith
(
cwd
)
else
file_name
for
error
in
errors
:
print
(
rel_path
+
':'
+
str
(
error
))
sys
.
exit
(
1
)
else
:
print
(
result
.
as_xml
()
if
isinstance
(
result
,
Node
)
else
result
)
else
:
print
(
"Usage: new2Compiler.py [FILENAME]"
)
experimental/new2/tst_new2_grammar.py
deleted
100755 → 0
View file @
adf2be35
#!/usr/bin/python3
"""tst_new2_grammar.py - runs the unit tests for the new2-grammar
"""
import
os
import
sys
sys
.
path
.
append
(
r
'/home/eckhart/Entwicklung/DHParser'
)
scriptpath
=
os
.
path
.
dirname
(
__file__
)
try
:
from
DHParser
import
dsl
import
DHParser.log
from
DHParser
import
testing
except
ModuleNotFoundError
:
print
(
'Could not import DHParser. Please adjust sys.path in file '
'"%s" manually'
%
__file__
)
sys
.
exit
(
1
)
def
recompile_grammar
(
grammar_src
,
force
):
with
DHParser
.
log
.
logging
(
False
):
# recompiles Grammar only if it has changed
if
not
dsl
.
recompile_grammar
(
grammar_src
,
force
=
force
):
print
(
'
\n
Errors while recompiling "%s":'
%
grammar_src
+
'
\n
--------------------------------------
\n\n
'
)
with
open
(
'new2_ebnf_ERRORS.txt'
)
as
f
:
print
(
f
.
read
())
sys
.
exit
(
1
)
def
run_grammar_tests
(
glob_pattern
):
with
DHParser
.
log
.
logging
(
False
):
error_report
=
testing
.
grammar_suite
(
os
.
path
.
join
(
scriptpath
,
'grammar_tests'
),
get_grammar
,
get_transformer
,
fn_patterns
=
[
glob_pattern
],
report
=
True
,
verbose
=
True
)
return
error_report
if
__name__
==
'__main__'
:
arg
=
sys
.
argv
[
1
]
if
len
(
sys
.
argv
)
>
1
else
'*_test_*.ini'
if
arg
.
endswith
(
'.ebnf'
):
recompile_grammar
(
arg
,
force
=
True
)
else
:
recompile_grammar
(
os
.
path
.
join
(
scriptpath
,
'new2.ebnf'
),
force
=
False
)
sys
.
path
.
append
(
'.'
)
from
new2Compiler
import
get_grammar
,
get_transformer
error_report
=
run_grammar_tests
(
glob_pattern
=
arg
)
if
error_report
:
print
(
'
\n
'
)
print
(
error_report
)
sys
.
exit
(
1
)
print
(
'ready.
\n
'
)
experimental/ws/README.md
deleted
100644 → 0
View file @
adf2be35
# ws
PLACE A SHORT DESCRIPTION HERE
Author: AUTHOR'S NAME
<EMAIL>
, AFFILIATION
## License
ws is open source software under the
[
Apache 2.0 License
](
https://www.apache.org/licenses/LICENSE-2.0
)
Copyright YEAR AUTHOR'S NAME
<EMAIL>
, AFFILIATION
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
experimental/ws/example.dsl
deleted
100644 → 0
View file @
adf2be35
Life is but a walking shadow
experimental/ws/grammar_tests/01_test_word.ini
deleted
100644 → 0
View file @
adf2be35
[match:WORD]
M1:
word
M2:
one_word_with_underscores
[fail:WORD]
F1:
two
words
experimental/ws/grammar_tests/02_test_document.ini
deleted
100644 → 0
View file @
adf2be35
[match:document]
M1:
"""This
is
a
sequence
of
words
extending
over
several
lines"""
M2:
"""
This
sequence
contains
leading
whitespace"""
[fail:document]
F1:
"""This
test
should
fail,
because
neither
comma
nor
full
have
been
defined
anywhere."""
experimental/ws/tst_ws_grammar.py
deleted
100755 → 0
View file @
adf2be35
#!/usr/bin/python3
"""tst_ws_grammar.py - runs the unit tests for the ws-grammar
"""
import
os
import
sys
sys
.
path
.
append
(
r
'/home/eckhart/Entwicklung/DHParser'
)
scriptpath
=
os
.
path
.
dirname
(
__file__
)
try
:
from
DHParser
import
dsl
import
DHParser.log
from
DHParser
import
testing
except
ModuleNotFoundError
:
print
(
'Could not import DHParser. Please adjust sys.path in file '
'"%s" manually'
%
__file__
)
sys
.
exit
(
1
)
def
recompile_grammar
(
grammar_src
,
force
):
with
DHParser
.
log
.
logging
(
False
):
# recompiles Grammar only if it has changed
if
not
dsl
.
recompile_grammar
(
grammar_src
,
force
=
force
):
print
(
'
\n
Errors while recompiling "%s":'
%
grammar_src
+
'
\n
--------------------------------------
\n\n
'
)
with
open
(
'ws_ebnf_ERRORS.txt'
)
as
f
:
print
(
f
.
read
())
sys
.
exit
(
1
)
def
run_grammar_tests
(
glob_pattern
):
with
DHParser
.
log
.
logging
(
False
):
error_report
=
testing
.
grammar_suite
(
os
.
path
.
join
(
scriptpath
,
'grammar_tests'
),
get_grammar
,
get_transformer
,
fn_patterns
=
[
glob_pattern
],
report
=
True
,
verbose
=
True
)
return
error_report
if
__name__
==
'__main__'
:
arg
=
sys
.
argv
[
1
]
if
len
(
sys
.
argv
)
>
1
else
'*_test_*.ini'
if
arg
.
endswith
(
'.ebnf'
):
recompile_grammar
(
arg
,
force
=
True
)
else
:
recompile_grammar
(
os
.
path
.
join
(
scriptpath
,
'ws.ebnf'
),
force
=
False
)
sys
.
path
.
append
(
'.'
)
from
wsCompiler
import
get_grammar
,
get_transformer
error_report
=
run_grammar_tests
(
glob_pattern
=
arg
)
if
error_report
:
print
(
'
\n
'
)
print
(
error_report
)
sys
.
exit
(
1
)
print
(
'ready.
\n
'
)
experimental/ws/ws.ebnf
deleted
100644 → 0
View file @
adf2be35
# ws-grammar
#######################################################################
#
# EBNF-Directives
#
#######################################################################
@ whitespace = vertical # implicit whitespace, includes any number of line feeds
@ literalws = right # literals have implicit whitespace on the right hand side
@ comment = /#.*/ # comments range from a '#'-character to the end of the line
@ ignorecase = False # literals and regular expressions are case-sensitive
#######################################################################
#
# Structure and Components
#
#######################################################################
document = ~ { WORD } §EOF # root parser: a sequence of words preceded by whitespace
# until the end of file
#######################################################################
#
# Regular Expressions
#
#######################################################################
WORD = /\w+/ ~ # a sequence of letters, optional trailing whitespace
EOF = !/./ # no more characters ahead, end of file reached
Prev
1
2
Next
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment