Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
D
DHParser
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Locked Files
Issues
0
Issues
0
List
Boards
Labels
Service Desk
Milestones
Iterations
Merge Requests
0
Merge Requests
0
Requirements
Requirements
List
Security & Compliance
Security & Compliance
Dependency List
License Compliance
Operations
Operations
Incidents
Analytics
Analytics
Code Review
Insights
Issue
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
badw-it
DHParser
Commits
8d307fb7
Commit
8d307fb7
authored
Sep 03, 2017
by
Eckhart Arnold
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
- weitere
parent
a4ffb225
Changes
10
Hide whitespace changes
Inline
Side-by-side
Showing
10 changed files
with
97 additions
and
20 deletions
+97
-20
DHParser/parser.py
DHParser/parser.py
+74
-15
DHParser/syntaxtree.py
DHParser/syntaxtree.py
+2
-0
DHParser/toolkit.py
DHParser/toolkit.py
+1
-1
buildpackages.sh
buildpackages.sh
+1
-1
examples/BibTeX/grammar_tests/00_test_regexes.ini
examples/BibTeX/grammar_tests/00_test_regexes.ini
+0
-0
examples/BibTeX/grammar_tests/01_test_entry.ini
examples/BibTeX/grammar_tests/01_test_entry.ini
+5
-0
examples/LaTeX/LaTeX.ebnf
examples/LaTeX/LaTeX.ebnf
+3
-0
examples/LaTeX/LaTeXCompiler.py
examples/LaTeX/LaTeXCompiler.py
+4
-1
examples/LaTeX/grammar_tests/02_test_paragraph.ini
examples/LaTeX/grammar_tests/02_test_paragraph.ini
+5
-1
setup.cfg
setup.cfg
+2
-1
No files found.
DHParser/parser.py
View file @
8d307fb7
...
...
@@ -278,8 +278,8 @@ def add_parser_guard(parser_func):
elif
grammar
.
memoization__
:
# otherwise also cache None-results
parser
.
visited
[
location
]
=
(
None
,
rest
)
elif
(
(
grammar
.
memoization__
or
location
in
grammar
.
recursion_locations__
)
and
grammar
.
last_rb__loc__
>
location
):
elif
(
grammar
.
last_rb__loc__
>
location
and
(
grammar
.
memoization__
or
location
in
grammar
.
recursion_locations__
)
):
# - variable manipulating parsers will not be entered into the cache,
# because caching would interfere with changes of variable state
# - in case of left recursion, the first recursive step that
...
...
@@ -492,7 +492,7 @@ class Grammar:
>>> number_parser("3.1416").content()
'3.1416'
Collecting the parsers that define a grammar in a descen
tand
class of
Collecting the parsers that define a grammar in a descen
dant
class of
class Grammar and assigning the named parsers to class variables
rather than global variables has several advantages:
...
...
@@ -500,11 +500,12 @@ class Grammar:
2. The parser names of named parsers do not need to be passed to the
constructor of the Parser object explicitly, but it suffices to
assign them to class variables.
assign them to class variables, which results in better
readability of the Python code.
3. The parsers in
class do not necessarily need to be connected to one
single root parser, which is helpful for testing and building up a
parser successively of several components.
3. The parsers in
the class do not necessarily need to be connected
to one single root parser, which is helpful for testing and
building up a
parser successively of several components.
As a consequence, though, it is highly recommended that a Grammar
class should not define any other variables or methods with names
...
...
@@ -551,14 +552,18 @@ class Grammar:
(no comments, horizontal right aligned whitespace) don't fit:
COMMENT__: regular expression string for matching comments
WSP__: regular expression for whitespace and comments
wspL__: regular expression string for left aligned whitespace,
which either equals WSP__ or is empty.
wspR__: regular expression string for right aligned whitespace,
which either equals WSP__ or is empty.
root__: The root parser of the grammar. Theoretically, all parsers of the
grammar should be reachable by the root parser. However, for testing
of yet incomplete grammars class Grammar does not assume that this
is the case.
parser_initializiation__: Before the parser class (!) has been initialized,
which happens upon the first time it is instantiated (see doctring for
method `_assign_parser_names()` for an explanation), this class
...
...
@@ -568,7 +573,7 @@ class Grammar:
Attributes:
all_parsers__: A set of all parsers connected to this grammar object
h
o
story_tracking__: A flag indicating that the parsing history shall
h
i
story_tracking__: A flag indicating that the parsing history shall
be tracked
wsp_left_parser__: A parser for the default left-adjacent-whitespace
...
...
@@ -576,6 +581,7 @@ class Grammar:
default is empty. The default whitespace will be used by parsers
`Token` and, if no other parsers are passed to its constructor,
by parser `RE'.
wsp_right_parser__: The same for the default right-adjacent-whitespace.
Both wsp_left_parser__ and wsp_right_parser__ merely serve the
purpose to avoid having to specify the default whitespace
...
...
@@ -587,16 +593,20 @@ class Grammar:
document__: the text that has most recently been parsed or that is
currently being parsed.
_reversed__: the same text in reverse order - needed by the `Lookbehind'-
parsers.
variables__: A mapping for variable names to a stack of their respective
string values - needed by the `Capture`-, `Retrieve`- and `Pop`-
parsers.
rollback__: A list of tuples (location, rollback-function) that are
deposited by the `Capture`- and `Pop`-parsers. If the parsing
process reaches a dead end then all rollback-functions up to
the point to which it retreats will be called and the state
of the variable stack restored accordingly.
last_rb__loc__: The last, i.e. most advanced location in the text
where a variable changing operation occurred. If the parser
backtracks to a location at or before `last_rb__loc__` (which,
...
...
@@ -605,23 +615,28 @@ class Grammar:
changing operations is necessary that occurred after the
location to which the parser backtracks. This is done by
calling method `.rollback_to__(location)`.
call_stack__: A stack of all parsers that have been called. This
is required for recording the parser history (for debugging)
and, eventually, i.e. one day in the future, for tracing through
the parsing process.
history__: A list of parser-call-stacks. A parser-call-stack is
appended to the list each time a parser either matches, fails
or if a parser-error occurs.
moving_forward__: This flag indicates that the parsing process is currently
moving forward . It is needed to reduce noise in history recording
and should not be considered as having a valid value if history
recording is turned off! (See `add_parser_guard` and its local
function `guarded_call`)
recursion_locations__: Stores the locations where left recursion was
detected. Needed to provide minimal memoization for the left
recursion detection algorithm, but, strictly speaking, superfluous
if full memoization is enabled. (See `add_parser_guard` and its
local function `guarded_call`)
memoization__: Turns full memoization on or off. Turning memoization off
results in less memory usage and sometimes reduced parsing time.
In some situations it may drastically increase parsing time, so
...
...
@@ -1079,7 +1094,7 @@ class RE(Parser):
>>> result.structure()
'(:RE (:RegExp "Haus") (:Whitespace " "))'
>>> parser(' Haus').content()
' <<< Error on " Haus" | Parser did not match! Invalid source file? >>> '
' <<< Error on " Haus" | Parser did not match! Invalid source file?
\\
n Most advanced: None
\\
n Last match: None;
>>> '
EBNF-Notation: `/ ... /~` or `~/ ... /` or `~/ ... /~`
EBNF-Example: `word = /\w+/~`
...
...
@@ -1247,7 +1262,7 @@ class Option(UnaryOperator):
>>> Grammar(number)('3.14159').content()
'3.14159'
>>> Grammar(number)('3.14159').structure()
'(:Series (:Option
al) (:RegExp "3") (:Optional
(:RegExp ".14159")))'
'(:Series (:Option
) (:RegExp "3") (:Option
(:RegExp ".14159")))'
>>> Grammar(number)('-1').content()
'-1'
...
...
@@ -1285,6 +1300,8 @@ class ZeroOrMore(Option):
>>> sentence = ZeroOrMore(RE(r'\w+,?')) + Token('.')
>>> Grammar(sentence)('Wo viel der Weisheit, da auch viel des Grämens.').content()
'Wo viel der Weisheit, da auch viel des Grämens.'
>>> Grammar(sentence)('.').content() # an empty sentence also matches
'.'
EBNF-Notation: `{ ... }`
EBNF-Example: `sentence = { /\w+,?/ } "."`
...
...
@@ -1308,6 +1325,22 @@ class ZeroOrMore(Option):
class
OneOrMore
(
UnaryOperator
):
"""
`OneOrMore` applies a parser repeatedly as long as this parser
matches. Other than `ZeroOrMore` which always matches, at least
one match is required by `OneOrMore`.
Examples:
>>> sentence = OneOrMore(RE(r'\w+,?')) + Token('.')
>>> Grammar(sentence)('Wo viel der Weisheit, da auch viel des Grämens.').content()
'Wo viel der Weisheit, da auch viel des Grämens.'
>>> Grammar(sentence)('.').content() # an empty sentence also matches
' <<< Error on "." | Parser did not match! Invalid source file?
\\
n Most advanced: None
\\
n Last match: None; >>> '
EBNF-Notation: `{ ... }+`
EBNF-Example: `sentence = { /\w+,?/ }+`
"""
def
__init__
(
self
,
parser
:
Parser
,
name
:
str
=
''
)
->
None
:
super
(
OneOrMore
,
self
).
__init__
(
parser
,
name
)
assert
not
isinstance
(
parser
,
Option
),
\
...
...
@@ -1336,6 +1369,21 @@ class OneOrMore(UnaryOperator):
class
Series
(
NaryOperator
):
"""
Matches if each of a series of parsers matches exactly in the order of
the series.
Example:
>>> variable_name = RegExp('(?!\d)\w') + RE('\w*')
>>> Grammar(variable_name)('variable_1').content()
'variable_1'
>>> Grammar(variable_name)('1_variable').content()
' <<< Error on "1_variable" | Parser did not match! Invalid source file?
\\
n Most advanced: None
\\
n Last match: None; >>> '
EBNF-Notation: `... ...` (sequence of parsers separated by a blank or new line)
EBNF-Example: `series = letter letter_or_digit`
"""
def
__init__
(
self
,
*
parsers
:
Parser
,
name
:
str
=
''
)
->
None
:
super
(
Series
,
self
).
__init__
(
*
parsers
,
name
=
name
)
assert
len
(
self
.
parsers
)
>=
1
...
...
@@ -1356,6 +1404,9 @@ class Series(NaryOperator):
def
__repr__
(
self
):
return
" "
.
join
(
parser
.
repr
for
parser
in
self
.
parsers
)
# The following operator definitions add syntactical sugar, so one can write:
# `RE('\d+') + Optional(RE('\.\d+)` instead of `Series(RE('\d+'), Optional(RE('\.\d+))`
def
__add__
(
self
,
other
:
Parser
)
->
'Series'
:
other_parsers
=
cast
(
'Series'
,
other
).
parsers
if
isinstance
(
other
,
Series
)
\
else
cast
(
Tuple
[
Parser
,
...],
(
other
,))
# type: Tuple[Parser, ...]
...
...
@@ -1385,12 +1436,15 @@ class Alternative(NaryOperator):
# the order of the sub-expression matters!
>>> number = RE('\d+') | RE('\d+') + RE('\.') + RE('\d+')
>>> Grammar(number)("3.1416").content()
'3 <<< Error on ".141
6
" | Parser stopped before end! trying to recover... >>> '
'3 <<< Error on ".141" | Parser stopped before end! trying to recover... >>> '
# the most selective expression should be put first:
>>> number = RE('\d+') + RE('\.') + RE('\d+') | RE('\d+')
>>> Grammar(number)("3.1416").content()
'3.1416'
EBNF-Notation: `... | ...`
EBNF-Example: `sentence = /\d+\.\d+/ | /\d+/`
"""
def
__init__
(
self
,
*
parsers
:
Parser
,
name
:
str
=
''
)
->
None
:
...
...
@@ -1410,6 +1464,15 @@ class Alternative(NaryOperator):
def
__repr__
(
self
):
return
'('
+
' | '
.
join
(
parser
.
repr
for
parser
in
self
.
parsers
)
+
')'
def
reset
(
self
):
super
(
Alternative
,
self
).
reset
()
self
.
been_here
=
{}
return
self
# The following operator definitions add syntactical sugar, so one can write:
# `RE('\d+') + RE('\.') + RE('\d+') | RE('\d+')` instead of:
# `Alternative(Series(RE('\d+'), RE('\.'), RE('\d+')), RE('\d+'))`
def
__or__
(
self
,
other
:
Parser
)
->
'Alternative'
:
other_parsers
=
cast
(
'Alternative'
,
other
).
parsers
if
isinstance
(
other
,
Alternative
)
\
else
cast
(
Tuple
[
Parser
,
...],
(
other
,))
# type: Tuple[Parser, ...]
...
...
@@ -1426,10 +1489,6 @@ class Alternative(NaryOperator):
self
.
parsers
+=
other_parsers
return
self
def
reset
(
self
):
super
(
Alternative
,
self
).
reset
()
self
.
been_here
=
{}
return
self
...
...
DHParser/syntaxtree.py
View file @
8d307fb7
...
...
@@ -261,6 +261,8 @@ class Node(collections.abc.Sized):
# assert ((isinstance(result, tuple) and all(isinstance(child, Node) for child in result))
# or isinstance(result, Node)
# or isinstance(result, str)), str(result)
# Possible optimization: Do not allow single nodes as argument:
# assert not isinstance(result, Node)
self
.
_result
=
(
result
,)
if
isinstance
(
result
,
Node
)
else
str
(
result
)
\
if
isinstance
(
result
,
StringView
)
else
result
or
''
# type: StrictResultType
self
.
children
=
cast
(
ChildrenType
,
self
.
_result
)
\
...
...
DHParser/toolkit.py
View file @
8d307fb7
...
...
@@ -299,7 +299,7 @@ def error_messages(source_text, errors) -> List[str]:
def
escape_re
(
s
)
->
str
:
"""Returns `s` with all regular expression special characters escaped.
"""
assert
isinstance
(
s
,
str
)
#
assert isinstance(s, str)
re_chars
=
r"\.^$*+?{}[]()#<>=|!"
for
esc_ch
in
re_chars
:
s
=
s
.
replace
(
esc_ch
,
'
\\
'
+
esc_ch
)
...
...
buildpackages.sh
View file @
8d307fb7
#!/bin/sh
python3 setup.py sdist bdist
python3 setup.py sdist bdist
_wheel
examples/BibTeX/grammar_tests/00_test_regexes.ini
0 → 100644
View file @
8d307fb7
examples/BibTeX/grammar_tests/01_test_entry.ini
0 → 100644
View file @
8d307fb7
[match:content]
simple
:
{Edward
N.
Zalta}
nested_braces
:
{\url{https://plato.stanford.edu/archives/fall2013/entries/thomas-kuhn/}}
examples/LaTeX/LaTeX.ebnf
View file @
8d307fb7
...
...
@@ -74,6 +74,7 @@ tabular_row = (multicolumn | tabular_cell) { "&" (multicolumn | tabular_
tabular_cell = { line_element //~ }
tabular_config = "{" /[lcr|]+/~ §"}"
#### paragraphs and sequences of paragraphs ####
block_of_paragraphs = "{" [sequence] §"}"
...
...
@@ -82,6 +83,7 @@ paragraph = { !blockcmd text_element //~ }+
text_element = line_element | LINEFEED
line_element = text | block | inline_environment | command
#### inline enivronments ####
inline_environment = known_inline_env | generic_inline_env
...
...
@@ -95,6 +97,7 @@ end_environment = /\\end{/ §::NAME §/}/
inline_math = /\$/ /[^$]*/ §/\$/
#### commands ####
command = known_command | text_command | generic_command
...
...
examples/LaTeX/LaTeXCompiler.py
View file @
8d307fb7
...
...
@@ -123,6 +123,7 @@ class LaTeXGrammar(Grammar):
tabular_cell = { line_element //~ }
tabular_config = "{" /[lcr|]+/~ §"}"
#### paragraphs and sequences of paragraphs ####
block_of_paragraphs = "{" [sequence] §"}"
...
...
@@ -131,6 +132,7 @@ class LaTeXGrammar(Grammar):
text_element = line_element | LINEFEED
line_element = text | block | inline_environment | command
#### inline enivronments ####
inline_environment = known_inline_env | generic_inline_env
...
...
@@ -144,6 +146,7 @@ class LaTeXGrammar(Grammar):
inline_math = /\$/ /[^$]*/ §/\$/
#### commands ####
command = known_command | text_command | generic_command
...
...
@@ -220,7 +223,7 @@ class LaTeXGrammar(Grammar):
paragraph
=
Forward
()
tabular_config
=
Forward
()
text_element
=
Forward
()
source_hash__
=
"
ed181ac517b686f843e13d5783527fe3
"
source_hash__
=
"
57dd004091e87ff603b51f0a47857cf4
"
parser_initialization__
=
"upon instantiation"
COMMENT__
=
r'%.*'
WHITESPACE__
=
r'[ \t]*(?:\n(?![ \t]*\n)[ \t]*)?'
...
...
examples/LaTeX/grammar_tests/02_test_paragraph.ini
View file @
8d307fb7
...
...
@@ -12,8 +12,11 @@
4
:
Paragraphs
%
may
contain
comments
like
the
comment
above
%
or
like
this
comment.
%
or
like
this
t
comment.
Comment
lines
do
not
break
paragraphs.
%
There
can
even
be
several
%
comment
lines
in
sequence.
5
:
Paragraphs
may
contain
{\em
emphasized}
or
{\bf
bold}
text.
Most
of
these
commands
can
have
different
forms
as,
for
example:
...
...
@@ -67,6 +70,7 @@
%
and
comments
%
or
sequences
of
comment
lines
In
the
end
such
a
sequence
counts
...
...
setup.cfg
View file @
8d307fb7
[bdist]
[bdist_wheel]
universal=1
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment