Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
badw-it
DHParser
Commits
4f6c3ae8
Commit
4f6c3ae8
authored
Aug 05, 2017
by
Eckhart Arnold
Browse files
further LaTeX tests
parent
c13ed3d3
Changes
6
Hide whitespace changes
Inline
Side-by-side
DHParser/parser.py
View file @
4f6c3ae8
...
...
@@ -294,13 +294,19 @@ class Parser(ParserBase, metaclass=ParserMetaClass):
only to their class name, and not to the individual parser.
Parser objects are callable and parsing is done by calling a parser
object with the text to parse. If the parser matches it returns
a tuple consisting of a node representing the root of the concrete
syntax tree resulting from the match as well as the substring
`text[i:]` where i is the length of matched text (which can be
zero in the case of parsers like `ZeroOrMore` or `Optional`).
If `i > 0` then the parser has "moved forward". If the parser does
not match it returns `(None, text).
object with the text to parse.
If the parser matches it returns a tuple consisting of a node
representing the root of the concrete syntax tree resulting from the
match as well as the substring `text[i:]` where i is the length of
matched text (which can be zero in the case of parsers like
`ZeroOrMore` or `Optional`). If `i > 0` then the parser has "moved
forward".
If the parser does not match it returns `(None, text). **Note** that
this is not the same as an empty match `("", text)`. Any empty match
can for example be returned by the `ZeroOrMore`-parser in case the
contained parser is repeated zero times.
"""
ApplyFunc
=
Callable
[[
'Parser'
],
None
]
...
...
@@ -674,8 +680,17 @@ class Grammar:
stitches
.
append
(
Node
(
None
,
rest
))
result
=
Node
(
None
,
tuple
(
stitches
))
if
any
(
self
.
variables__
.
values
()):
result
.
add_error
(
"Capture-retrieve-stack not empty after end of parsing: "
+
str
(
self
.
variables__
))
error_str
=
"Capture-retrieve-stack not empty after end of parsing: "
+
\
str
(
self
.
variables__
)
if
result
.
children
:
# add another child node at the end to ensure that the position
# of the error will be the end of the text. Otherwise, the error
# message above ("...after end of parsing") would appear illogical.
error_node
=
Node
(
ZOMBIE_PARSER
,
''
)
error_node
.
add_error
(
error_str
)
result
.
result
=
result
.
children
+
(
error_node
,)
else
:
result
.
add_error
(
error_str
)
result
.
pos
=
0
# calculate all positions
return
result
...
...
@@ -886,14 +901,14 @@ class RE(Parser):
Regular Expressions with optional leading or trailing whitespace.
The RE-parser parses pieces of text that match a given regular
expression. Other than the ``RegExp``-Parser it can also skip
expression. Other than the ``RegExp``-Parser it can also skip
"implicit whitespace" before or after the matched text.
The whitespace is in turn defined by a regular expression. It
should
be made sure that this expression also matches the empty
string,
e.g. use r'\s*' or r'[
\t
]+', but not r'\s+'. If the
respective
parameters in the constructor are set to ``None`` the
default
whitespace expression from the Grammar object will be used.
The whitespace is in turn defined by a regular expression. It
should
be made sure that this expression also matches the empty
string,
e.g. use r'\s*' or r'[
\t
]+', but not r'\s+'. If the
respective
parameters in the constructor are set to ``None`` the
default
whitespace expression from the Grammar object will be used.
Example (allowing whitespace on the right hand side, but not on
the left hand side of a regular expression):
...
...
@@ -976,9 +991,8 @@ class RE(Parser):
class
Token
(
RE
):
"""
Class Token parses simple strings. Any regular regular
expression commands will be interpreted as simple sequence of
characters.
Class Token parses simple strings. Any regular regular expression
commands will be interpreted as simple sequence of characters.
Other than that class Token is essentially a renamed version of
class RE. Because tokens often have a particular semantic different
...
...
@@ -1000,16 +1014,16 @@ class Token(RE):
########################################################################
#
# Combinator parser classes (i.e. trunk classes of the parser tree)
# Containing parser classes, i.e. parsers that contain other parsers
# to which they delegate (i.e. trunk classes)
#
########################################################################
class
UnaryOperator
(
Parser
):
"""
Base class of all unary parser operators, i.e. parser that
contains one and only one other parser, like the optional
parser for example.
Base class of all unary parser operators, i.e. parser that contains
one and only one other parser, like the optional parser for example.
The UnaryOperator base class supplies __deepcopy__ and apply
methods for unary parser operators. The __deepcopy__ method needs
...
...
@@ -1036,10 +1050,10 @@ class NaryOperator(Parser):
contains one or more other parsers, like the alternative
parser for example.
The NnaryOperator base class supplies __deepcopy__ and apply
methods
for unary parser operators. The __deepcopy__ method needs
to be
overwritten, however, if the constructor of a derived class
has
additional parameters.
The NnaryOperator base class supplies __deepcopy__ and apply
methods
for unary parser operators. The __deepcopy__ method needs
to be
overwritten, however, if the constructor of a derived class
has
additional parameters.
"""
def
__init__
(
self
,
*
parsers
:
Parser
,
name
:
str
=
''
)
->
None
:
super
(
NaryOperator
,
self
).
__init__
(
name
)
...
...
@@ -1103,6 +1117,19 @@ class Optional(UnaryOperator):
class
ZeroOrMore
(
Optional
):
"""
`ZeroOrMore` applies a parser repeatedly as long as this parser
matches. Like `Optional` the `ZeroOrMore` parser always matches. In
case of zero repetitions, the empty match `((), text)` is returned.
Examples:
>>> sentence = ZeroOrMore(RE(r'\w+,?')) + Token('.')
>>> Grammar(sentence)('Wo viel der Weisheit, da auch viel des Grämens.').content()
'Wo viel der Weisheit, da auch viel des Grämens.'
EBNF-Notation: `{ ... }`
EBNF-Example: `sentence = { /\w+,?/ } "."`
"""
def
__call__
(
self
,
text
:
str
)
->
Tuple
[
Node
,
str
]:
results
=
()
# type: Tuple[Node, ...]
n
=
len
(
text
)
+
1
...
...
DHParser/syntaxtree.py
View file @
4f6c3ae8
...
...
@@ -197,8 +197,6 @@ class Node:
# self.pos: int = 0 # continuous updating of pos values wastes a lot of time
self
.
_pos
=
-
1
# type: int
self
.
parser
=
parser
or
ZOMBIE_PARSER
self
.
error_flag
=
any
(
r
.
error_flag
for
r
in
self
.
_children
)
\
if
self
.
_children
else
False
# type: bool
def
__str__
(
self
):
if
self
.
children
:
...
...
@@ -242,6 +240,8 @@ class Node:
self
.
_result
=
(
result
,)
if
isinstance
(
result
,
Node
)
else
result
or
''
# type: StrictResultType
self
.
_children
=
cast
(
ChildrenType
,
self
.
_result
)
\
if
isinstance
(
self
.
_result
,
tuple
)
else
cast
(
ChildrenType
,
())
# type: ChildrenType
self
.
error_flag
=
any
(
r
.
error_flag
for
r
in
self
.
_children
)
\
if
self
.
_children
else
False
# type: bool
@
property
def
children
(
self
)
->
ChildrenType
:
...
...
examples/LaTeX/LaTeX.ebnf
View file @
4f6c3ae8
...
...
@@ -119,11 +119,11 @@ text = { cfgtext | (BRACKETS //~) }+
cfgtext = { word_sequence | (ESCAPED //~) }+
word_sequence = { TEXTCHUNK //~ }+
no_command = "\begin{" | "\end" | structural
blockcmd =
/[\\]/
( ( "begin{" | "end{" )
( "enumerate" | "itemize" | "figure" | "quote"
| "quotation" | "tabular") "}"
| structural | begin_generic_block | end_generic_block )
no_command = "\begin{" | "\end" |
BACKSLASH
structural
blockcmd =
BACKSLASH
( ( "begin{" | "end{" )
( "enumerate" | "itemize" | "figure" | "quote"
| "quotation" | "tabular") "}"
| structural | begin_generic_block | end_generic_block )
structural = "subsection" | "section" | "chapter" | "subsubsection"
| "paragraph" | "subparagraph" | "item"
...
...
@@ -147,7 +147,8 @@ WSPC = /[ \t]+/ # (horizontal) whitespace
LF = !PARSEP /[ \t]*\n[ \t]*/ # linefeed but not an empty line
PARSEP = /[ \t]*(?:\n[ \t]*)+\n[ \t]*/ # at least one empty line, i.e.
# [whitespace] linefeed [whitespace] linefeed
EOF = /(?!.)/
LB = /\s*?\n|$/ # backwards line break for Lookbehind-Operator
# beginning of text marker '$' added for test code
\ No newline at end of file
# beginning of text marker '$' added for test code
BACKSLASH = /[\\]/
EOF = /(?!.)/ # End-Of-File
examples/LaTeX/LaTeXCompiler.py
View file @
4f6c3ae8
...
...
@@ -170,11 +170,11 @@ class LaTeXGrammar(Grammar):
cfgtext = { word_sequence | (ESCAPED //~) }+
word_sequence = { TEXTCHUNK //~ }+
no_command = "\begin{" | "\end" | structural
blockcmd =
/[\\]/
( ( "begin{" | "end{" )
( "enumerate" | "itemize" | "figure" | "quote"
| "quotation" | "tabular") "}"
| structural | begin_generic_block | end_generic_block )
no_command = "\begin{" | "\end" |
BACKSLASH
structural
blockcmd =
BACKSLASH
( ( "begin{" | "end{" )
( "enumerate" | "itemize" | "figure" | "quote"
| "quotation" | "tabular") "}"
| structural | begin_generic_block | end_generic_block )
structural = "subsection" | "section" | "chapter" | "subsubsection"
| "paragraph" | "subparagraph" | "item"
...
...
@@ -198,24 +198,26 @@ class LaTeXGrammar(Grammar):
LF = !PARSEP /[ \t]*\n[ \t]*/ # linefeed but not an empty line
PARSEP = /[ \t]*(?:\n[ \t]*)+\n[ \t]*/ # at least one empty line, i.e.
# [whitespace] linefeed [whitespace] linefeed
EOF = /(?!.)/
LB = /\s*?\n|$/ # backwards line break for Lookbehind-Operator
# beginning of text marker '$' added for test code
BACKSLASH = /[\\]/
EOF = /(?!.)/ # End-Of-File
"""
begin_generic_block
=
Forward
()
block_environment
=
Forward
()
block_of_paragraphs
=
Forward
()
end_generic_block
=
Forward
()
text_elements
=
Forward
()
source_hash__
=
"
7f6e1c72047e44b0b39db4d20f5186e2
"
source_hash__
=
"
06385bac4dd7cb009bd29712a8fc692c
"
parser_initialization__
=
"upon instantiation"
COMMENT__
=
r
'%.*(?:\n|$)'
WSP__
=
mixin_comment
(
whitespace
=
r
'[ \t]*(?:\n(?![ \t]*\n)[ \t]*)?'
,
comment
=
r
'%.*(?:\n|$)'
)
wspL__
=
''
wspR__
=
WSP__
LB
=
RegExp
(
'
\\
s*?
\\
n|$'
)
EOF
=
RegExp
(
'(?!.)'
)
BACKSLASH
=
RegExp
(
'[
\\\\
]'
)
LB
=
RegExp
(
'
\\
s*?
\\
n|$'
)
PARSEP
=
RegExp
(
'[
\\
t]*(?:
\\
n[
\\
t]*)+
\\
n[
\\
t]*'
)
LF
=
Series
(
NegativeLookahead
(
PARSEP
),
RegExp
(
'[
\\
t]*
\\
n[
\\
t]*'
))
WSPC
=
RegExp
(
'[
\\
t]+'
)
...
...
@@ -225,8 +227,8 @@ class LaTeXGrammar(Grammar):
NAME
=
Capture
(
RE
(
'
\\
w+'
))
CMDNAME
=
RE
(
'
\\\\
(?:(?!_)
\\
w)+'
)
structural
=
Alternative
(
Token
(
"subsection"
),
Token
(
"section"
),
Token
(
"chapter"
),
Token
(
"subsubsection"
),
Token
(
"paragraph"
),
Token
(
"subparagraph"
),
Token
(
"item"
))
blockcmd
=
Series
(
RegExp
(
'[
\\\\
]'
)
,
Alternative
(
Series
(
Alternative
(
Token
(
"begin{"
),
Token
(
"end{"
)),
Alternative
(
Token
(
"enumerate"
),
Token
(
"itemize"
),
Token
(
"figure"
),
Token
(
"quote"
),
Token
(
"quotation"
),
Token
(
"tabular"
)),
Token
(
"}"
)),
structural
,
begin_generic_block
,
end_generic_block
))
no_command
=
Alternative
(
Token
(
"
\\
begin{"
),
Token
(
"
\\
end"
),
structural
)
blockcmd
=
Series
(
BACKSLASH
,
Alternative
(
Series
(
Alternative
(
Token
(
"begin{"
),
Token
(
"end{"
)),
Alternative
(
Token
(
"enumerate"
),
Token
(
"itemize"
),
Token
(
"figure"
),
Token
(
"quote"
),
Token
(
"quotation"
),
Token
(
"tabular"
)),
Token
(
"}"
)),
structural
,
begin_generic_block
,
end_generic_block
))
no_command
=
Alternative
(
Token
(
"
\\
begin{"
),
Token
(
"
\\
end"
),
Series
(
BACKSLASH
,
structural
)
)
word_sequence
=
OneOrMore
(
Series
(
TEXTCHUNK
,
RE
(
''
)))
cfgtext
=
OneOrMore
(
Alternative
(
word_sequence
,
Series
(
ESCAPED
,
RE
(
''
))))
text
=
OneOrMore
(
Alternative
(
cfgtext
,
Series
(
BRACKETS
,
RE
(
''
))))
...
...
examples/LaTeX/grammar_tests/test_environment.ini
View file @
4f6c3ae8
...
...
@@ -21,7 +21,8 @@
[fail:block_environment]
1
:
"\begin{generic}inline
environment\end{generic}"
1
:
"""\begin{generic}inline
environment\end{generic}
"""
2
:
"""\begin{generic}inline
environment
\end{generic}
...
...
@@ -33,7 +34,8 @@
[match:inline_environment]
1
:
"\begin{generic}inline
environment\end{generic}"
1
:
"""\begin{generic}inline
environment\end{generic}
"""
2
:
"""\begin{generic}inline
environment
\end{generic}
...
...
@@ -46,3 +48,61 @@
invalid
enivronment
\end{generic}
"""
[match:itemize]
1
:
\begin{itemize}
\item
Items
doe
not
need
to
be
\item
separated
by
empty
lines.
\end{itemize}
2
:
\begin{itemize}
\item
But
items
may
be
\item
separated
by
blank
lines.
\item
Empty
lines
at
the
beginning
of
an
item
will
be
ignored.
\end{itemize}
3
:
\begin{itemize}
\item
Items
can
consist
of
several
paragraphs.
\item
Or
of
one
paragraph
\end{itemize}
4
:
\begin{itemize}
\item
\begin{itemize}
\item
Item-lists
can
be
nested!
\end{itemize}
\end{itemize}
[fail:itemize]
1
:
\begin{itemize}
Free
text
is
not
allowed
within
an
itemized
environment!
\end{itemize}
[match:enumerate]
1
:
\begin{enumerate}
\item
Enumerations
work
just
like
item-lists.
\item
Only
that
the
bullets
are
numbers.
\end{enumerate}
2:
\begin{enumerate}
\item
\begin{itemize}
\item
Item-lists
and
\item
Enumeration-lists
\begin{enumerate}
\item
can
be
nested
\item
arbitrarily
\end{enumerate}
\item
Another
item
\end{itemize}
\item
Plain
numerated
item.
\end{enumerate}
examples/LaTeX/grammar_tests/test_paragraph.ini
View file @
4f6c3ae8
...
...
@@ -15,6 +15,16 @@
%
or
like
this
comment.
Comment
lines
do
not
break
paragraphs.
5
:
Paragraphs
may
contain
{\em
emphasized}
or
{\bf
bold}
text.
Most
of
these
commands
can
have
different
forms
as,
for
example:
\begin{small}
small
\end{small}
or
{\large
large}.
6
:
Paragraphs
may
also
contain
{\xy
unknown
blocks
}.
7
:
Paragraphs
may
contain
\xy
[xycgf]
{unbknown}
commands.
8
:
Unknwon
\xy
commands
within
paragraphs
may
be
simple
or
\xy{complex}.
[fail:paragraph]
1
:
\begin{enumerate}
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment