Commit 5d1c1750 authored by Eckhart Arnold's avatar Eckhart Arnold
Browse files

typos corrected

parent 4e0da72c
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="LaTeXParser" type="PythonConfigurationType" factoryName="Python" nameIsGenerated="true">
<module name="DHParser" />
<option name="INTERPRETER_OPTIONS" value="" />
<option name="PARENT_ENVS" value="true" />
<envs>
<env name="PYTHONUNBUFFERED" value="1" />
</envs>
<option name="SDK_HOME" value="" />
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$/examples/LaTeX" />
<option name="IS_MODULE_SDK" value="true" />
<option name="ADD_CONTENT_ROOTS" value="true" />
<option name="ADD_SOURCE_ROOTS" value="true" />
<option name="SCRIPT_NAME" value="$PROJECT_DIR$/examples/LaTeX/LaTeXParser.py" />
<option name="PARAMETERS" value="testdata/Voegelins_Bewusstseinsphilosophie/Voegelins_Bewusstseinsphilosophie.tex" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
<option name="REDIRECT_INPUT" value="false" />
<option name="INPUT_FILE" value="" />
<method v="2" />
</configuration>
</component>
\ No newline at end of file
......@@ -1035,7 +1035,7 @@ in the sense of "without" more intuitive::
Next to the lookahead operators, there also exist lookback operators. Be warned,
though, that look back operators are an **experimental** feature in DHParser
and that their implementation is highly idiosyncratic, that is, it is most
lilely not compatible with any other parser-generator-toolkit based on EBNF-grammers.
likely not compatible with any other parser-generator-toolkit based on EBNF-grammers.
Also, lookback operators in DHParser are more restricted than lookahead-operators.
They can only be used in combination with simple text or regular expression parsers
and - here comes the idiosyncratic part - they work in the opposite direction.
......@@ -1082,14 +1082,14 @@ There are three different challenges:
1. Locating the error at the correct position in the source code.
2. Providing proper error messages that explain the reason for the error.
3. Resuming the parsing progress after an error has occured at the nearest
3. Resuming the parsing progress after an error has occurred at the nearest
possible place without producing artificial follow-up errors.
If the following, DHParser's techniques for the first two challenges,
locating errors and customizing error messages will be described.
Techniques for resuming the parsing process after an error occurred
or for passing by erreneous passages in the source code will be
explained below, unter the heading "Fail-tolerant Parsing".
or for passing by erroneous passages in the source code will be
explained below, under the heading "Fail-tolerant Parsing".
Farthest-Fail-Heuristics
^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -1101,7 +1101,7 @@ the error at the "farthest" position where a parser failed,
reporting the last named parser in the call chain (that first reached
this location) as the cause of the failure. This approach often works
surprisingly well for locating errors, unless the grammar relies to
heavy on regular exrpessions capturing large chunks of text, because
heavy on regular expressions capturing large chunks of text, because
the error location works only on the level of the parsing expression
grammar not at that of the atomic regular expressions. To see how
farthest fail word, consider a parser for simple arithmetic
......@@ -1126,18 +1126,18 @@ expressions::
As can be seen the location of the error is captured well enough,
at least when we keep in mind that the computer cannot guess where
we would have placed the forgotton closing bracket. It can only
report the point where the mistake becomes apparant.
report the point where the mistake becomes aparant.
However, the reported fact that it was the sub-parser `*` of
parser term that failed at this location does little to enlighten
us with respect to the cause of the failure. The "farthest fail"-method
as implemented by DHParser yields the
first parser (of possibly several) that has been tried at the
position where the farthest fail occured. Thus, in this case,
position where the farthest fail occurred. Thus, in this case,
a failure of the parser capturing `*` is reporeted rather than
of the parser expression->`+`. Changing this by reporting the
last parser or all parsers that failed at this location would
do little to remedy this situaiton, however. In this example,
do little to remedy this situation, however. In this example,
it would just be as confusing to learn that expression->´+` failed
at the end of the parsed string.
......@@ -1147,8 +1147,8 @@ Marking mandatory items with "§"
Thus, "farthest fail"-method is not very suitable for explaining
the failure or pinpointing which parser really was the culprit.
Therefore, DHParser provides a simple annotation that allows to
raise a parsing error deliberately, if a ceratin point in the
chain of parsers has not been reached: By placind the "§"-sign
raise a parsing error deliberately, if a certain point in the
chain of parsers has not been reached: By placing the "§"-sign
as a "mandatory-marker" in front of a parser, the parser as well
as all subsequent parsers in the same sequence, will not simply
return a non-match when failing, but it will cause the entire
......@@ -1178,13 +1178,13 @@ to place the mandatory marker in front of a parser that might fail at a location
that could still be reached and matched by another branch of the grammar.
(In our example it is clear that round brackets enclose only groups. Thus,
if the opening round bracket has matched, we can be sure that what follows
must be an expression fllowed by a closing round bracket, or, if not it is
must be an expression followed by a closing round bracket, or, if not it is
a mistake.) Luckily, although this may sound complicated, in practice it
never is. Unless you grammar is very badly structured, you will hardly
ever make this mistake, an if you do, you will notice soon enough.
Also, there is an important restriction: There is only one §-marker
allowed per named parser. In case you have a long EBFN-expression on the
allowed per named parser. In case you have a long EBNF-expression on the
right hand side of a symbol-definition, where you'd like to use the
§-marker at more than one place, you can, however, always split it into
several expression by introducing new symbols. These symbols, if they
......@@ -1218,7 +1218,7 @@ name consists of the name of a symbol that contains a §-marker and the
appendix `_error`. The directive always takes two arguments, separated
as usual by a comma, of which the first is condition-expression and
the second an error message. The condition can be used to make
the choice of an error-message dependenant on the text following the
the choice of an error-message dependant on the text following the
point of failure. It can either be
a regular expression or a simple string which must match (or be equal
to in the case of the string) the first part of the text at the
......@@ -1258,7 +1258,7 @@ different conditions and messages but related to the same symbol
can be specified. The conditions are evaluated in the order the
error-directives appear in the grammar and the error message
of the first matching condition is picked. Therefore, the more
specfic conditions should always be placed first and the more
specific conditions should always be placed first and the more
general or fallback conditions should be placed below these::
>>> grammar = ("@ string_error = /\\\\\\/, 'Illegal escape sequence »{1}« "
......@@ -1269,7 +1269,7 @@ general or fallback conditions should be placed below these::
1:4: Error (1040): Parser "string" stopped before end, at: \pha" Terminating parser.
Here, the more specific and more understandable error message
has been selectec. Careful readers might notice that the the
has been selected. Careful readers might notice that the the
more general customized error message "Illegal character(s)
... found in string" will now only be selected, if the
string contains a character that not even regular expression
......@@ -1278,7 +1278,7 @@ is not allowed within the string are the closing quotation
marks that terminate the string and which do not cause the
parser to fail (but only to terminate to early).
Also, it might be noticed that the errors are alwayes caused
Also, it might be noticed that the errors are always caused
by a failure to match the second `"`-sign, because the
characters-parser also matches the empty string and thus
never fails or raises any error. Nonetheless, the error
......@@ -1340,7 +1340,7 @@ explicitly to such nodes, afterwards:
>>> for e in tree.errors: print(e)
1:2: Error (1000): Fehler im String: al\pha
Unfortunetely, the error location is not very precise. This can be remedied
Unfortunately, the error location is not very precise. This can be remedied
by refining our error junction code::
>>> grammar = '''
......@@ -1368,7 +1368,7 @@ Here, the node named "ups" pinpoints the precise error location.
Like most techniques for fail-tolerant parsing, this one is not quite
as easy to master in practice as it might look. Generally, adding
a junction for erroneous code works best, when the passage that shall
be by-passed is delineated by a easily recongnizable follow-up strings.
be by-passed is delineated by a easily recognizable follow-up strings.
In this example the follow-up string would be the '"'. The method fails,
of course if the follow-up text is erroneous, too, or has even been
forgotten. So, to be absolutely sure, one would have to consider
......@@ -1438,14 +1438,14 @@ parser when an error was raised by the string-parser::
(ZOMBIE__ `(1:4: Error (1010): Illegal escape sequence »\pha"...«) "\pha")
(:Text '"'))
After the error has occured at the illegal escape-sequence, the
After the error has occurred at the illegal escape-sequence, the
skip-directive catches the error and skips to the location where the
"-character lies just ahead and continues parsing with the string-parser.
The skipped passage is stored in a ZOMBIE__-Node within the syntax-tree
and parsing can continue through to the end of the text.
In contrast to the skip-directive the resume-directive leaves the parser
that raised the error and resumes one level higer up in the call chain.
that raised the error and resumes one level higher up in the call chain.
Semantic Actions and Storing Variables
......
This diff is collapsed.
[match:element]
[ast:element]
[fail:element]
[match:group]
[ast:group]
[fail:group]
[match:unordered]
[ast:unordered]
[fail:unordered]
[match:interleave]
[ast:interleave]
[fail:interleave]
[match:oneormore]
[ast:oneormore]
[fail:oneormore]
[match:repetition]
[ast:repetition]
[fail:repetition]
[match:option]
[ast:option]
[fail:option]
[match:pure_elem]
[ast:pure_elem]
[fail:pure_elem]
[match:element]
[ast:element]
[fail:element]
[match:flowmarker]
[ast:flowmarker]
[fail:flowmarker]
[match:retrieveop]
[ast:retrieveop]
[fail:retrieveop]
[match:expression]
[ast:expression]
[fail:expression]
[match:term]
[ast:term]
[fail:term]
[match:factor]
[ast:factor]
[fail:factor]
[match:syntax]
[ast:syntax]
[fail:syntax]
[match:definition]
[ast:definition]
[fail:definition]
[match:directive]
[ast:directive]
[fail:directive]
{
"jsonrpc": "2.0",
"id": 0,
"method": "initialize",
"params": {
"processId": 3408,
"rootPath": "/home/eckhart/Entwicklung/DHParser/examples/EBNF",
"rootUri": "file:///home/eckhart/Entwicklung/DHParser/examples/EBNF",
"capabilities": {
"workspace": {
"applyEdit": true,
"workspaceEdit": {
"documentChanges": true
},
"didChangeConfiguration": {
"dynamicRegistration": true
},
"didChangeWatchedFiles": {
"dynamicRegistration": true
},
"symbol": {
"dynamicRegistration": true,
"symbolKind": {
"valueSet": [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26
]
}
},
"executeCommand": {
"dynamicRegistration": true
},
"configuration": true,
"workspaceFolders": true
},
"textDocument": {
"publishDiagnostics": {
"relatedInformation": true
},
"synchronization": {
"dynamicRegistration": true,
"willSave": true,
"willSaveWaitUntil": true,
"didSave": true
},
"completion": {
"dynamicRegistration": true,
"contextSupport": true,
"completionItem": {
"snippetSupport": true,
"commitCharactersSupport": true,
"documentationFormat": [
"markdown",
"plaintext"
],
"deprecatedSupport": true,
"preselectSupport": true
},
"completionItemKind": {
"valueSet": [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25
]
}
},
"hover": {
"dynamicRegistration": true,
"contentFormat": [
"markdown",
"plaintext"
]
},
"signatureHelp": {
"dynamicRegistration": true,
"signatureInformation": {
"documentationFormat": [
"markdown",
"plaintext"
]
}
},
"definition": {
"dynamicRegistration": true
},
"references": {
"dynamicRegistration": true
},
"documentHighlight": {
"dynamicRegistration": true
},
"documentSymbol": {
"dynamicRegistration": true,
"symbolKind": {
"valueSet": [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26
]
},
"hierarchicalDocumentSymbolSupport": true
},
"codeAction": {
"dynamicRegistration": true,
"codeActionLiteralSupport": {
"codeActionKind": {
"valueSet": [
"",
"quickfix",
"refactor",
"refactor.extract",
"refactor.inline",
"refactor.rewrite",
"source",
"source.organizeImports"
]
}
}
},
"codeLens": {
"dynamicRegistration": true
},
"formatting": {
"dynamicRegistration": true
},
"rangeFormatting": {
"dynamicRegistration": true
},
"onTypeFormatting": {
"dynamicRegistration": true
},
"rename": {
"dynamicRegistration": true
},
"documentLink": {
"dynamicRegistration": true
},
"typeDefinition": {
"dynamicRegistration": true
},
"implementation": {
"dynamicRegistration": true
},
"colorProvider": {
"dynamicRegistration": true
},
"foldingRange": {
"dynamicRegistration": true,
"rangeLimit": 5000,
"lineFoldingOnly": true
}
}
},
"trace": "verbose",
"workspaceFolders": [
{
"uri": "file:///home/eckhart/Entwicklung/DHParser/examples/EBNF",
"name": "EBNF"
}
]
}
}
\ No newline at end of file
# LaTeX-Grammar for DHParser
# preamble
@ literalws = right
@ whitespace = /[ \t]*(?:\n(?![ \t]*\n)[ \t]*)?/ # insignificant whitespace, including at most one linefeed
@ comment = /%.*/ # note: trailing linefeed is not part of the comment proper
@ reduction = merge_treetops
@ disposable = _WSPC, _GAP, _LB, _PARSEP, _LETTERS, _NAME, INTEGER, FRAC,
_QUALIFIED, TEXT_NOPAR, TEXT, _block_content, PATH, PATHSEP,
HASH, COLON, TAG, _inline_math_text, _has_block_start,
block_environment, known_environment, text_element, _block_math,
line_element, inline_environment, known_inline_env, info_block,
begin_inline_env, end_inline_env, command, known_command,
_dmath_long_form, _dmath_short_form, BACKSLASH, _structure_name,
_env_name
@ drop = strings, backticked, whitespace, regexps, _WSPC, _GAP, _PARSEP, _LB,
_has_block_start, BACKSLASH, _structure_name, _env_name
########################################################################
#
#: outer document structure
#
########################################################################
latexdoc = preamble §document
preamble = { [_WSPC] command }+
document = [_WSPC] "\begin{document}"
§frontpages
(Chapters | Sections)
[Bibliography] [Index] [_WSPC]
"\end{document}" [_WSPC] EOF
frontpages = sequence
#######################################################################
#
#: document structure
#
#######################################################################
Chapters = { [_WSPC] Chapter }+
Chapter = `\chapter` [hide_from_toc] heading { sequence | Sections | Paragraphs }
Sections = { [_WSPC] Section }+
Section = `\section` [hide_from_toc] heading { sequence | SubSections | Paragraphs }
SubSections = { [_WSPC] SubSection }+
SubSection = `\subsection` [hide_from_toc] heading { sequence | SubSubSections | Paragraphs }
SubSubSections = { [_WSPC] SubSubSection }+
SubSubSection = `\subsubsection` [hide_from_toc] heading { sequence | Paragraphs }
hide_from_toc = "*"
Paragraphs = { [_WSPC] Paragraph }+
Paragraph = "\paragraph" heading { sequence | SubParagraphs }
SubParagraphs = { [_WSPC] SubParagraph }+
SubParagraph = "\subparagraph" heading [ sequence ]
Bibliography = [_WSPC] "\bibliography" heading
Index = [_WSPC] "\printindex"
heading = block
#######################################################################
#
#: document content
#
#######################################################################
#### block environments ####
block_environment = &_has_block_start known_environment | generic_block
_has_block_start = `\begin{` | `\[`
known_environment = itemize | enumerate | description | figure | tabular | quotation
| verbatim | math_block
math_block = equation | eqnarray | displaymath
generic_block = begin_generic_block { sequence | item } §end_generic_block
begin_generic_block = <-&_LB begin_environment
@ end_generic_block_error = '', "A block environment must be followed by a linefeed, not by: {1}"
end_generic_block = end_environment § LFF
itemize = "\begin{itemize}" [_WSPC] { item | command~ } §"\end{itemize}"
enumerate = "\begin{enumerate}" [_WSPC] { item | command~ } §"\end{enumerate}"
description = "\begin{description}" [_WSPC] {item | command~ } §"\end{description}"
@item_error = '', '\item without proper content, found: {1}'
item = "\item" [config] § sequence
figure = "\begin{figure}" sequence §"\end{figure}"