In January 2021 we will introduce a 10 GB quota for project repositories. Higher limits for individual projects will be available on request. Please see https://doku.lrz.de/display/PUBLIC/GitLab for more information.

Commit 4f6c3ae8 authored by Eckhart Arnold's avatar Eckhart Arnold

further LaTeX tests

parent c13ed3d3
......@@ -294,13 +294,19 @@ class Parser(ParserBase, metaclass=ParserMetaClass):
only to their class name, and not to the individual parser.
Parser objects are callable and parsing is done by calling a parser
object with the text to parse. If the parser matches it returns
a tuple consisting of a node representing the root of the concrete
syntax tree resulting from the match as well as the substring
`text[i:]` where i is the length of matched text (which can be
zero in the case of parsers like `ZeroOrMore` or `Optional`).
If `i > 0` then the parser has "moved forward". If the parser does
not match it returns `(None, text).
object with the text to parse.
If the parser matches it returns a tuple consisting of a node
representing the root of the concrete syntax tree resulting from the
match as well as the substring `text[i:]` where i is the length of
matched text (which can be zero in the case of parsers like
`ZeroOrMore` or `Optional`). If `i > 0` then the parser has "moved
forward".
If the parser does not match it returns `(None, text). **Note** that
this is not the same as an empty match `("", text)`. Any empty match
can for example be returned by the `ZeroOrMore`-parser in case the
contained parser is repeated zero times.
"""
ApplyFunc = Callable[['Parser'], None]
......@@ -674,8 +680,17 @@ class Grammar:
stitches.append(Node(None, rest))
result = Node(None, tuple(stitches))
if any(self.variables__.values()):
result.add_error("Capture-retrieve-stack not empty after end of parsing: "
+ str(self.variables__))
error_str = "Capture-retrieve-stack not empty after end of parsing: " + \
str(self.variables__)
if result.children:
# add another child node at the end to ensure that the position
# of the error will be the end of the text. Otherwise, the error
# message above ("...after end of parsing") would appear illogical.
error_node = Node(ZOMBIE_PARSER, '')
error_node.add_error(error_str)
result.result = result.children + (error_node,)
else:
result.add_error(error_str)
result.pos = 0 # calculate all positions
return result
......@@ -886,14 +901,14 @@ class RE(Parser):
Regular Expressions with optional leading or trailing whitespace.
The RE-parser parses pieces of text that match a given regular
expression. Other than the ``RegExp``-Parser it can also skip
expression. Other than the ``RegExp``-Parser it can also skip
"implicit whitespace" before or after the matched text.
The whitespace is in turn defined by a regular expression. It
should be made sure that this expression also matches the empty
string, e.g. use r'\s*' or r'[\t ]+', but not r'\s+'. If the
respective parameters in the constructor are set to ``None`` the
default whitespace expression from the Grammar object will be used.
The whitespace is in turn defined by a regular expression. It should
be made sure that this expression also matches the empty string,
e.g. use r'\s*' or r'[\t ]+', but not r'\s+'. If the respective
parameters in the constructor are set to ``None`` the default
whitespace expression from the Grammar object will be used.
Example (allowing whitespace on the right hand side, but not on
the left hand side of a regular expression):
......@@ -976,9 +991,8 @@ class RE(Parser):
class Token(RE):
"""
Class Token parses simple strings. Any regular regular
expression commands will be interpreted as simple sequence of
characters.
Class Token parses simple strings. Any regular regular expression
commands will be interpreted as simple sequence of characters.
Other than that class Token is essentially a renamed version of
class RE. Because tokens often have a particular semantic different
......@@ -1000,16 +1014,16 @@ class Token(RE):
########################################################################
#
# Combinator parser classes (i.e. trunk classes of the parser tree)
# Containing parser classes, i.e. parsers that contain other parsers
# to which they delegate (i.e. trunk classes)
#
########################################################################
class UnaryOperator(Parser):
"""
Base class of all unary parser operators, i.e. parser that
contains one and only one other parser, like the optional
parser for example.
Base class of all unary parser operators, i.e. parser that contains
one and only one other parser, like the optional parser for example.
The UnaryOperator base class supplies __deepcopy__ and apply
methods for unary parser operators. The __deepcopy__ method needs
......@@ -1036,10 +1050,10 @@ class NaryOperator(Parser):
contains one or more other parsers, like the alternative
parser for example.
The NnaryOperator base class supplies __deepcopy__ and apply
methods for unary parser operators. The __deepcopy__ method needs
to be overwritten, however, if the constructor of a derived class
has additional parameters.
The NnaryOperator base class supplies __deepcopy__ and apply methods
for unary parser operators. The __deepcopy__ method needs to be
overwritten, however, if the constructor of a derived class has
additional parameters.
"""
def __init__(self, *parsers: Parser, name: str = '') -> None:
super(NaryOperator, self).__init__(name)
......@@ -1103,6 +1117,19 @@ class Optional(UnaryOperator):
class ZeroOrMore(Optional):
"""
`ZeroOrMore` applies a parser repeatedly as long as this parser
matches. Like `Optional` the `ZeroOrMore` parser always matches. In
case of zero repetitions, the empty match `((), text)` is returned.
Examples:
>>> sentence = ZeroOrMore(RE(r'\w+,?')) + Token('.')
>>> Grammar(sentence)('Wo viel der Weisheit, da auch viel des Grämens.').content()
'Wo viel der Weisheit, da auch viel des Grämens.'
EBNF-Notation: `{ ... }`
EBNF-Example: `sentence = { /\w+,?/ } "."`
"""
def __call__(self, text: str) -> Tuple[Node, str]:
results = () # type: Tuple[Node, ...]
n = len(text) + 1
......
......@@ -197,8 +197,6 @@ class Node:
# self.pos: int = 0 # continuous updating of pos values wastes a lot of time
self._pos = -1 # type: int
self.parser = parser or ZOMBIE_PARSER
self.error_flag = any(r.error_flag for r in self._children) \
if self._children else False # type: bool
def __str__(self):
if self.children:
......@@ -242,6 +240,8 @@ class Node:
self._result = (result,) if isinstance(result, Node) else result or '' # type: StrictResultType
self._children = cast(ChildrenType, self._result) \
if isinstance(self._result, tuple) else cast(ChildrenType, ()) # type: ChildrenType
self.error_flag = any(r.error_flag for r in self._children) \
if self._children else False # type: bool
@property
def children(self) -> ChildrenType:
......
......@@ -119,11 +119,11 @@ text = { cfgtext | (BRACKETS //~) }+
cfgtext = { word_sequence | (ESCAPED //~) }+
word_sequence = { TEXTCHUNK //~ }+
no_command = "\begin{" | "\end" | structural
blockcmd = /[\\]/ ( ( "begin{" | "end{" )
( "enumerate" | "itemize" | "figure" | "quote"
| "quotation" | "tabular") "}"
| structural | begin_generic_block | end_generic_block )
no_command = "\begin{" | "\end" | BACKSLASH structural
blockcmd = BACKSLASH ( ( "begin{" | "end{" )
( "enumerate" | "itemize" | "figure" | "quote"
| "quotation" | "tabular") "}"
| structural | begin_generic_block | end_generic_block )
structural = "subsection" | "section" | "chapter" | "subsubsection"
| "paragraph" | "subparagraph" | "item"
......@@ -147,7 +147,8 @@ WSPC = /[ \t]+/ # (horizontal) whitespace
LF = !PARSEP /[ \t]*\n[ \t]*/ # linefeed but not an empty line
PARSEP = /[ \t]*(?:\n[ \t]*)+\n[ \t]*/ # at least one empty line, i.e.
# [whitespace] linefeed [whitespace] linefeed
EOF = /(?!.)/
LB = /\s*?\n|$/ # backwards line break for Lookbehind-Operator
# beginning of text marker '$' added for test code
\ No newline at end of file
# beginning of text marker '$' added for test code
BACKSLASH = /[\\]/
EOF = /(?!.)/ # End-Of-File
......@@ -170,11 +170,11 @@ class LaTeXGrammar(Grammar):
cfgtext = { word_sequence | (ESCAPED //~) }+
word_sequence = { TEXTCHUNK //~ }+
no_command = "\begin{" | "\end" | structural
blockcmd = /[\\]/ ( ( "begin{" | "end{" )
( "enumerate" | "itemize" | "figure" | "quote"
| "quotation" | "tabular") "}"
| structural | begin_generic_block | end_generic_block )
no_command = "\begin{" | "\end" | BACKSLASH structural
blockcmd = BACKSLASH ( ( "begin{" | "end{" )
( "enumerate" | "itemize" | "figure" | "quote"
| "quotation" | "tabular") "}"
| structural | begin_generic_block | end_generic_block )
structural = "subsection" | "section" | "chapter" | "subsubsection"
| "paragraph" | "subparagraph" | "item"
......@@ -198,24 +198,26 @@ class LaTeXGrammar(Grammar):
LF = !PARSEP /[ \t]*\n[ \t]*/ # linefeed but not an empty line
PARSEP = /[ \t]*(?:\n[ \t]*)+\n[ \t]*/ # at least one empty line, i.e.
# [whitespace] linefeed [whitespace] linefeed
EOF = /(?!.)/
LB = /\s*?\n|$/ # backwards line break for Lookbehind-Operator
# beginning of text marker '$' added for test code
BACKSLASH = /[\\]/
EOF = /(?!.)/ # End-Of-File
"""
begin_generic_block = Forward()
block_environment = Forward()
block_of_paragraphs = Forward()
end_generic_block = Forward()
text_elements = Forward()
source_hash__ = "7f6e1c72047e44b0b39db4d20f5186e2"
source_hash__ = "06385bac4dd7cb009bd29712a8fc692c"
parser_initialization__ = "upon instantiation"
COMMENT__ = r'%.*(?:\n|$)'
WSP__ = mixin_comment(whitespace=r'[ \t]*(?:\n(?![ \t]*\n)[ \t]*)?', comment=r'%.*(?:\n|$)')
wspL__ = ''
wspR__ = WSP__
LB = RegExp('\\s*?\\n|$')
EOF = RegExp('(?!.)')
BACKSLASH = RegExp('[\\\\]')
LB = RegExp('\\s*?\\n|$')
PARSEP = RegExp('[ \\t]*(?:\\n[ \\t]*)+\\n[ \\t]*')
LF = Series(NegativeLookahead(PARSEP), RegExp('[ \\t]*\\n[ \\t]*'))
WSPC = RegExp('[ \\t]+')
......@@ -225,8 +227,8 @@ class LaTeXGrammar(Grammar):
NAME = Capture(RE('\\w+'))
CMDNAME = RE('\\\\(?:(?!_)\\w)+')
structural = Alternative(Token("subsection"), Token("section"), Token("chapter"), Token("subsubsection"), Token("paragraph"), Token("subparagraph"), Token("item"))
blockcmd = Series(RegExp('[\\\\]'), Alternative(Series(Alternative(Token("begin{"), Token("end{")), Alternative(Token("enumerate"), Token("itemize"), Token("figure"), Token("quote"), Token("quotation"), Token("tabular")), Token("}")), structural, begin_generic_block, end_generic_block))
no_command = Alternative(Token("\\begin{"), Token("\\end"), structural)
blockcmd = Series(BACKSLASH, Alternative(Series(Alternative(Token("begin{"), Token("end{")), Alternative(Token("enumerate"), Token("itemize"), Token("figure"), Token("quote"), Token("quotation"), Token("tabular")), Token("}")), structural, begin_generic_block, end_generic_block))
no_command = Alternative(Token("\\begin{"), Token("\\end"), Series(BACKSLASH, structural))
word_sequence = OneOrMore(Series(TEXTCHUNK, RE('')))
cfgtext = OneOrMore(Alternative(word_sequence, Series(ESCAPED, RE(''))))
text = OneOrMore(Alternative(cfgtext, Series(BRACKETS, RE(''))))
......
......@@ -21,7 +21,8 @@
[fail:block_environment]
1 : "\begin{generic}inline environment\end{generic}"
1 : """\begin{generic}inline environment\end{generic}
"""
2 : """\begin{generic}inline environment
\end{generic}
......@@ -33,7 +34,8 @@
[match:inline_environment]
1 : "\begin{generic}inline environment\end{generic}"
1 : """\begin{generic}inline environment\end{generic}
"""
2 : """\begin{generic}inline environment
\end{generic}
......@@ -46,3 +48,61 @@
invalid enivronment \end{generic}
"""
[match:itemize]
1 : \begin{itemize}
\item Items doe not need to be
\item separated by empty lines.
\end{itemize}
2 : \begin{itemize}
\item But items may be
\item separated by blank lines.
\item
Empty lines at the beginning of an item will be ignored.
\end{itemize}
3 : \begin{itemize}
\item Items can consist of
several paragraphs.
\item Or of one paragraph
\end{itemize}
4 : \begin{itemize}
\item
\begin{itemize}
\item Item-lists can be nested!
\end{itemize}
\end{itemize}
[fail:itemize]
1 : \begin{itemize}
Free text is not allowed within an itemized environment!
\end{itemize}
[match:enumerate]
1 : \begin{enumerate}
\item Enumerations work just like item-lists.
\item Only that the bullets are numbers.
\end{enumerate}
2: \begin{enumerate}
\item \begin{itemize}
\item Item-lists and
\item Enumeration-lists
\begin{enumerate}
\item can be nested
\item arbitrarily
\end{enumerate}
\item Another item
\end{itemize}
\item Plain numerated item.
\end{enumerate}
......@@ -15,6 +15,16 @@
% or like this comment.
Comment lines do not break paragraphs.
5 : Paragraphs may contain {\em emphasized} or {\bf bold} text.
Most of these commands can have different forms as, for example:
\begin{small} small \end{small} or {\large large}.
6 : Paragraphs may also contain {\xy unknown blocks }.
7 : Paragraphs may contain \xy[xycgf]{unbknown} commands.
8 : Unknwon \xy commands within paragraphs may be simple
or \xy{complex}.
[fail:paragraph]
1 : \begin{enumerate}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment