Commit 6062648b authored by di68kap's avatar di68kap
Browse files

Documentation erweitert

parent 235753d3
......@@ -1450,6 +1450,15 @@ and parsing can continue through to the end of the text.
In contrast to the skip-directive the resume-directive leaves the parser
that raised the error and resumes one level higher up in the call chain.
The ``@ ..._resume``-directive that tells the *calling*
parsers where to continue after the array parser has failed.
So, the parser resuming the parsing process is not the array parser that
has failed, but the first parser in the reverse call-stack of "array" that
catches up at the location indicated by the ``@ ..._resume``-directive.
The location itself is determined by a regular expression, where the
point for reentry is the location *after* the next match of the regular
expression::
Semantic Actions and Storing Variables
......
......@@ -4,20 +4,20 @@ Overview of DHParser
DHParser is a parser-generator and domain-specific-language (DSL) construction kit that
is designed to make the process of designing, implementing and revising as DSL as
simple as possible. It can be used in an adhoc-fashion for small projects and
the grammar can be specified in Python like `pyparsing <https://pypi.org/project/pyparsing/>`
the grammar can be specified in Python like `pyparsing <https://pypi.org/project/pyparsing/>`_
or in a slightly amended version of the
`Extended-Backus-Naur-Form (EBNF) <https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`
`Extended-Backus-Naur-Form (EBNF) <https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`_
directly within the Python-code. Or DHParser can used for large projects where you set up a
directory tree with the grammar, parser, test-runner each residing in a separate file und the
test and example code in dedicated sub-directories.
DHParser uses `packrat parsing <https://bford.info/packrat/>` with full left-recursion support
DHParser uses `packrat parsing <https://bford.info/packrat/>`_ with full left-recursion support
which allows to build parsers for any context-free-grammar. It's got a post-mortem debugger
to analyse the parsing process and it offers facilities for unit-testing grammars and some
support for fail-tolerant parsing so that the parser does not stop at the first syntax error
it encounters. Finally, there is some support for writing language servers for DSLs
in Python that adhere to editor-independent the
`languag server-protocol <https://microsoft.github.io/language-server-protocol/>`.
`languag server-protocol <https://microsoft.github.io/language-server-protocol/>_`.
Adhoc-Parsers
......@@ -26,7 +26,7 @@ Adhoc-Parsers
In case you just need a parser for some very simple DSL, you can directly add a string
with the EBNF-grammar of that DSL to you python code and compile if into an executable
parser much like you'd compile a regular expresseion. Let's do this for a
`JSON <https://www.json.org/json-en.html>`-parser::
`JSON <https://www.json.org/json-en.html>`_-parser::
import sys
from DHParser.dsl import create_parser
......@@ -609,10 +609,118 @@ text, the grammar-debugger helps to locate the cause of
an error that is not due to a faulty source text but a
faulty grammar in the grammar.
Fail-tolerant parsing
---------------------
Fail-tolerance is the ability of a parser to resume parsing after an
error has been encountered. A parser that is fail-tolerant does not
stop parsing at the first error but can report several if not all
errors in a source-code file in one single run. Thus, the user is
not forced to fix an earlier error before she is even being informed
of the next error. Fail-tolerance is a particularly desirable property
when using a modern IDE that annotates errors while typing the
source code.
DHParser offers support for fail-tolerant parsing that goes beyond what
can be achieved within EBNF alone. A prerequisite for fail-tolerant-parsing
is to annotate the the grammar with ``§``-markers ("mandatory-marker") at
places where one can be sure that the parser annotated with the marker
must match if it is called at all. This is usually the case for parsers
in a series after the point where it is uniquely determined.
F or example, once the opening bracket of a bracketed expression has
been matched by a parser it is clear that eventually the closing bracket will be matched
by its respective parser, too, or it is an error. Thus, in our JSON-grammar
we could write::
array = "[" [ _element { "," _element } ] §"]"
The ``§`` advises the following parser(s) in the series to raise an error
on the spot instead of merely returning a non-match if they fail.
If we wantet to, we could also add a ``§``-marker in front of the second
``_element``-parser, because after a komma there must always be another
element in an array or it is an error.
The §-marker can be supplemented with a ``@ ..._resume``-directive that
tells the calling parsers where to continue after the array parser has failed.
So, the parser resuming the parsing process is not the array parser that
has failed, but the first of the parsers in the call-stack of the array-parser that
catches up at the location indicated by the ``@ ..._resume``-directive.
The location itself is determined by a regular expression, where the
point for reentry is the location *after* the next match of the regular
expression::
@array_resume = /\]/
array = "[" [ _element { "," _element } ] §"]"
Here, the whole array up to and including the closing bracket ``]`` will
be skipped and the calling parser continue just as if the array had matched.
Let's see the difference this makes by running both versions of the grammar
over a simple test case::
[match:json]
M1: '''{ "number": 1,
"array": [1,2 3,4],
"string": "two" }'''
First, without re-entrance and without ``§``-marker the error message is not very informative and
no structure has been detected correctly. At least the location of the error has been determined
with good precision by the "farthest failure"-principle.::
### Error:
2:15: Error (1040): Parser "array->`,`" did not match: »3,4],
"string": "two ...«
Most advanced fail: 2, 15: json->_element->object->member->_element->array-> `,`; FAIL; "3,4],\n"string": "two" }"
Last match: 2, 13: json->_element->object->member->_element->array->_element->number; MATCH; "2 ";
### AST
(ZOMBIE__ (ZOMBIE__ `() '{ "number": 1,' "") (ZOMBIE__ '"array": [1,2 3,4],' '"string": "two" }'))
Secondly, still without re-entrance but with the ``§``-marker. The error-message is more precise, though the
followup-error "Parser stopped before end" may be confusing. The AST-tree (not shown here) contains more
structure, but is still littered with ``ZOMBIE__``-nodes of unidentified parts of the input::
### Error:
2:12: Error (1040): Parser "json" stopped before end, at: 3,4],
"str ... Terminating parser.
2:15: Error (1010): `]` ~ expected by parser 'array', »3,4],\n "str...« found!
Finally, with both ``§``-marker and resume-directive as denoted in the EBNF snippet
above, we receive a sound error message and, even more surprising, an almost complete
AST::
### Error:
2:15: Error (1010): `]` ~ expected by parser 'array', »3,4],\n "str...« found!
### AST
(json
(object
(member
(string
(PLAIN "number"))
(number "1"))
(member
(string
(PLAIN "array"))
(array
(number "1")
(number "2")
(ZOMBIE__ `(2:15: Error (1010): `]` ~ expected by parser 'array', »3,4],\n "str...« found!) ",2 3,4]")))
(member
(string
(PLAIN "string"))
(string
(PLAIN "two")))))
Compiling DSLs
--------------
......
......@@ -10,8 +10,9 @@
json = ~ _element _EOF
_element = object | array | string | number | _bool | null
object = "{" member { "," §member } §"}"
member = string §":" _element
object = "{" member { "," member } "}"
member = string ":" _element
@array_resume = /(?:[^\[\]]|(?:\[.*\]))*\]\s*/ # /\]/
array = "[" [ _element { "," _element } ] §"]"
#: simple elements
......
#!/usr/bin/env python3
#######################################################################
#
# SYMBOLS SECTION - Can be edited. Changes will be preserved.
#
#######################################################################
import collections
from functools import partial
import os
import sys
from typing import Tuple, List, Union, Any, Optional, Callable
try:
scriptpath = os.path.dirname(__file__)
except NameError:
scriptpath = ''
dhparser_parentdir = os.path.abspath(os.path.join(scriptpath, r'..\..'))
if scriptpath not in sys.path:
sys.path.append(scriptpath)
if dhparser_parentdir not in sys.path:
sys.path.append(dhparser_parentdir)
try:
import regex as re
except ImportError:
import re
from DHParser import start_logging, suspend_logging, resume_logging, is_filename, load_if_file, \
Grammar, Compiler, nil_preprocessor, PreprocessorToken, Whitespace, Drop, AnyChar, \
Lookbehind, Lookahead, Alternative, Pop, Text, Synonym, Counted, Interleave, INFINITE, \
Option, NegativeLookbehind, OneOrMore, RegExp, Retrieve, Series, Capture, TreeReduction, \
ZeroOrMore, Forward, NegativeLookahead, Required, CombinedParser, mixin_comment, \
compile_source, grammar_changed, last_value, matching_bracket, PreprocessorFunc, is_empty, \
remove_if, Node, TransformationFunc, TransformationDict, transformation_factory, traverse, \
remove_children_if, move_adjacent, normalize_whitespace, is_anonymous, matches_re, \
reduce_single_child, replace_by_single_child, replace_or_reduce, remove_whitespace, \
replace_by_children, remove_empty, remove_tokens, flatten, all_of, any_of, \
merge_adjacent, collapse, collapse_children_if, transform_content, WHITESPACE_PTYPE, \
TOKEN_PTYPE, remove_children, remove_content, remove_brackets, change_tag_name, \
remove_anonymous_tokens, keep_children, is_one_of, not_one_of, has_content, apply_if, peek, \
remove_anonymous_empty, keep_nodes, traverse_locally, strip, lstrip, rstrip, \
transform_content, replace_content_with, forbid, assert_content, remove_infix_operator, \
add_error, error_on, recompile_grammar, left_associative, lean_left, set_config_value, \
get_config_value, node_maker, access_thread_locals, access_presets, PreprocessorResult, \
finalize_presets, ErrorCode, RX_NEVER_MATCH, set_tracer, resume_notices_on, \
trace_history, has_descendant, neg, has_ancestor, optional_last_value, insert, \
positions_of, replace_tag_names, add_attributes, delimit_children, merge_connected, \
has_attr, has_parent, ThreadLocalSingletonFactory, Error, canonical_error_strings, \
has_errors, ERROR, FATAL, set_preset_value, get_preset_value, NEVER_MATCH_PATTERN, \
gen_find_include_func, preprocess_includes, make_preprocessor, chain_preprocessors
#######################################################################
#
# PREPROCESSOR SECTION - Can be edited. Changes will be preserved.
#
#######################################################################
RE_INCLUDE = NEVER_MATCH_PATTERN
# To capture includes, replace the NEVER_MATCH_PATTERN
# by a pattern with group "name" here, e.g. r'\input{(?P<name>.*)}'
def JSONTokenizer(original_text) -> Tuple[str, List[Error]]:
# Here, a function body can be filled in that adds preprocessor tokens
# to the source code and returns the modified source.
return original_text, []
def preprocessor_factory() -> PreprocessorFunc:
# below, the second parameter must always be the same as JSONGrammar.COMMENT__!
find_next_include = gen_find_include_func(RE_INCLUDE, '(?:\\/\\/.*)|(?:\\/\\*(?:.|\\n)*?\\*\\/)')
include_prep = partial(preprocess_includes, find_next_include=find_next_include)
tokenizing_prep = make_preprocessor(JSONTokenizer)
return chain_preprocessors(include_prep, tokenizing_prep)
get_preprocessor = ThreadLocalSingletonFactory(preprocessor_factory, ident=1)
def preprocess_JSON(source):
return get_preprocessor()(source)
#######################################################################
#
# PARSER SECTION - Don't edit! CHANGES WILL BE OVERWRITTEN!
#
#######################################################################
class JSONGrammar(Grammar):
r"""Parser for a JSON source file.
"""
_element = Forward()
source_hash__ = "9300fea6a90011a475ad52476f5b752f"
disposable__ = re.compile('_\\w+')
static_analysis_pending__ = [] # type: List[bool]
parser_initialization__ = ["upon instantiation"]
resume_rules__ = {'array': (re.compile(r'(?:[^\[\]]|(?:\[.*\]))*\]\s*'),)}
COMMENT__ = r'(?:\/\/.*)|(?:\/\*(?:.|\n)*?\*\/)'
comment_rx__ = re.compile(COMMENT__)
WHITESPACE__ = r'\s*'
WSP_RE__ = mixin_comment(whitespace=WHITESPACE__, comment=COMMENT__)
wsp__ = Whitespace(WSP_RE__)
dwsp__ = Drop(Whitespace(WSP_RE__))
_EOF = NegativeLookahead(RegExp('.'))
EXP = Series(Alternative(Text("E"), Text("e")), Option(Alternative(Text("+"), Text("-"))), RegExp('[0-9]+'))
FRAC = Series(Text("."), RegExp('[0-9]+'))
INT = Series(Option(Text("-")), Alternative(RegExp('[1-9][0-9]+'), RegExp('[0-9]')))
HEX = RegExp('[0-9a-fA-F][0-9a-fA-F]')
UNICODE = Series(Series(Drop(Text("\\u")), dwsp__), HEX, HEX)
ESCAPE = Alternative(RegExp('\\\\[/bnrt\\\\]'), UNICODE)
PLAIN = RegExp('[^"\\\\]+')
_CHARACTERS = ZeroOrMore(Alternative(PLAIN, ESCAPE))
null = Series(Text("null"), dwsp__)
false = Series(Text("false"), dwsp__)
true = Series(Text("true"), dwsp__)
_bool = Alternative(true, false)
number = Series(INT, Option(FRAC), Option(EXP), dwsp__)
string = Series(Text('"'), _CHARACTERS, Text('"'), dwsp__, mandatory=1)
array = Series(Series(Drop(Text("[")), dwsp__), Option(Series(_element, ZeroOrMore(Series(Series(Drop(Text(",")), dwsp__), _element)))), Series(Drop(Text("]")), dwsp__), mandatory=2)
member = Series(string, Series(Drop(Text(":")), dwsp__), _element)
object = Series(Series(Drop(Text("{")), dwsp__), member, ZeroOrMore(Series(Series(Drop(Text(",")), dwsp__), member)), Series(Drop(Text("}")), dwsp__))
_element.set(Alternative(object, array, string, number, _bool, null))
json = Series(dwsp__, _element, _EOF)
root__ = json
_raw_grammar = ThreadLocalSingletonFactory(JSONGrammar, ident=1)
def get_grammar() -> JSONGrammar:
grammar = _raw_grammar()
if get_config_value('resume_notices'):
resume_notices_on(grammar)
elif get_config_value('history_tracking'):
set_tracer(grammar, trace_history)
return grammar
def parse_JSON(document, start_parser = "root_parser__", *, complete_match=True):
return get_grammar()(document, start_parser, complete_match)
#######################################################################
#
# AST SECTION - Can be edited. Changes will be preserved.
#
#######################################################################
JSON_AST_transformation_table = {
# AST Transformations for the JSON-grammar
# "<": flatten
# "*": replace_by_single_child
# ">: []
"json": [],
"_element": [],
"object": [],
"member": [],
"array": [],
"string": [remove_brackets],
"number": [collapse],
"_bool": [],
"true": [],
"false": [],
"null": [],
"_CHARACTERS": [],
"PLAIN": [],
"ESCAPE": [],
"UNICODE": [],
"HEX": [],
"INT": [],
"NEG": [],
"FRAC": [],
"DOT": [],
"EXP": [],
"_EOF": [],
}
def JSONTransformer() -> TransformationFunc:
"""Creates a transformation function that does not share state with other
threads or processes."""
return partial(traverse, processing_table=JSON_AST_transformation_table.copy())
get_transformer = ThreadLocalSingletonFactory(JSONTransformer, ident=1)
def transform_JSON(cst):
get_transformer()(cst)
#######################################################################
#
# COMPILER SECTION - Can be edited. Changes will be preserved.
#
#######################################################################
class JSONCompiler(Compiler):
"""Compiler for the abstract-syntax-tree of a JSON source file.
"""
def __init__(self):
super(JSONCompiler, self).__init__()
def reset(self):
super().reset()
# initialize your variables here, not in the constructor!
def on_json(self, node):
return self.fallback_compiler(node)
# def on__element(self, node):
# return node
# def on_object(self, node):
# return node
# def on_member(self, node):
# return node
# def on_array(self, node):
# return node
# def on_string(self, node):
# return node
# def on_number(self, node):
# return node
# def on__bool(self, node):
# return node
# def on_true(self, node):
# return node
# def on_false(self, node):
# return node
# def on_null(self, node):
# return node
# def on__CHARACTERS(self, node):
# return node
# def on_PLAIN(self, node):
# return node
# def on_ESCAPE(self, node):
# return node
# def on_UNICODE(self, node):
# return node
# def on_HEX(self, node):
# return node
# def on_INT(self, node):
# return node
# def on_NEG(self, node):
# return node
# def on_FRAC(self, node):
# return node
# def on_DOT(self, node):
# return node
# def on_EXP(self, node):
# return node
# def on__EOF(self, node):
# return node
get_compiler = ThreadLocalSingletonFactory(JSONCompiler, ident=1)
def compile_JSON(ast):
return get_compiler()(ast)
#######################################################################
#
# END OF DHPARSER-SECTIONS
#
#######################################################################
RESULT_FILE_EXTENSION = ".sxpr" # Change this according to your needs!
def compile_src(source: str) -> Tuple[Any, List[Error]]:
"""Compiles ``source`` and returns (result, errors)."""
result_tuple = compile_source(source, get_preprocessor(), get_grammar(), get_transformer(),
get_compiler())
return result_tuple[:2] # drop the AST at the end of the result tuple
def serialize_result(result: Any) -> Union[str, bytes]:
"""Serialization of result. REWRITE THIS, IF YOUR COMPILATION RESULT
IS NOT A TREE OF NODES.
"""
if isinstance(result, Node):
return result.serialize(how='default' if RESULT_FILE_EXTENSION != '.xml' else 'xml')
else:
return repr(result)
def process_file(source: str, result_filename: str = '') -> str:
"""Compiles the source and writes the serialized results back to disk,
unless any fatal errors have occurred. Error and Warning messages are
written to a file with the same name as `result_filename` with an
appended "_ERRORS.txt" or "_WARNINGS.txt" in place of the name's
extension. Returns the name of the error-messages file or an empty
string, if no errors of warnings occurred.
"""
source_filename = source if is_filename(source) else ''
result, errors = compile_src(source)
if not has_errors(errors, FATAL):
if os.path.abspath(source_filename) != os.path.abspath(result_filename):
with open(result_filename, 'w') as f:
f.write(serialize_result(result))
else:
errors.append(Error('Source and destination have the same name "%s"!'
% result_filename, 0, FATAL))
if errors:
err_ext = '_ERRORS.txt' if has_errors(errors, ERROR) else '_WARNINGS.txt'
err_filename = os.path.splitext(result_filename)[0] + err_ext
with open(err_filename, 'w') as f:
f.write('\n'.join(canonical_error_strings(errors)))
return err_filename
return ''
def batch_process(file_names: List[str], out_dir: str,
*, submit_func: Callable = None,
log_func: Callable = None) -> List[str]:
"""Compiles all files listed in filenames and writes the results and/or
error messages to the directory `our_dir`. Returns a list of error
messages files.
"""
error_list = []
def gen_dest_name(name):
return os.path.join(out_dir, os.path.splitext(os.path.basename(name))[0] \
+ RESULT_FILE_EXTENSION)
def run_batch(submit_func: Callable):
nonlocal error_list
err_futures = []
for name in file_names:
dest_name = gen_dest_name(name)
err_futures.append(submit_func(process_file, name, dest_name))
for file_name, err_future in zip(file_names, err_futures):
error_filename = err_future.result()
if log_func:
log_func('Compiling "%s"' % file_name)
if error_filename:
error_list.append(error_filename)
if submit_func is None:
import concurrent.futures
from DHParser.toolkit import instantiate_executor
with instantiate_executor(get_config_value('batch_processing_parallelization'),
concurrent.futures.ProcessPoolExecutor) as pool:
run_batch(pool.submit)
else:
run_batch(submit_func)
return error_list
if __name__ == "__main__":
# recompile grammar if needed
if __file__.endswith('Parser.py'):
grammar_path = os.path.abspath(__file__).replace('Parser.py', '.ebnf')
else:
grammar_path = os.path.splitext(__file__)[0] + '.ebnf'
parser_update = False
def notify():
global parser_update
parser_update = True
print('recompiling ' + grammar_path)
if os.path.exists(grammar_path) and os.path.isfile(grammar_path):
if not recompile_grammar(grammar_path, force=False, notify=notify):
error_file = os.path.basename(__file__).replace('Parser.py', '_ebnf_ERRORS.txt')
with open(error_file, encoding="utf-8") as f:
print(f.read())
sys.exit(1)
elif parser_update:
print(os.path.basename(__file__) + ' has changed. '
'Please run again in order to apply updated compiler')
sys.exit(0)
else:
print('Could not check whether grammar requires recompiling, '
'because grammar was not found at: ' + grammar_path)
from argparse import ArgumentParser
parser = ArgumentParser(description="Parses a JSON-file and shows its syntax-tree.")
parser.add_argument('files', nargs='+')
parser.add_argument('-d', '--debug', action='store_const', const='debug',
help='Store debug information in LOGS subdirectory')
parser.add_argument('-x', '--xml', action='store_const', const='xml',
help='Store result as XML instead of S-expression')
parser.add_argument('-o', '--out', nargs=1, default=['out'],
help='Output directory for batch processing')
parser.add_argument('-v', '--verbose', action='store_const', const='verbose',
help='Verbose output')
parser.add_argument('--singlethread', action='store_const', const='singlethread',
help='Sun batch jobs in a single thread (recommended only for debugging)')
args = parser.parse_args()
file_names, out, log_dir = args.files, args.out[0], ''
# if not os.path.exists(file_name):
# print('File "%s" not found!' % file_name)
# sys.exit(1)
# if not os.path.isfile(file_name):
# print('"%s" is not a file!' % file_name)
# sys.exit(1)
if args.debug is not None:
log_dir = 'LOGS'
set_config_value('history_tracking', True)
set_config_value('resume_notices', True)
set_config_value('log_syntax_trees', set(['cst', 'ast'])) # don't use a set literal, here
start_logging(log_dir)
if args.xml:
RESULT_FILE_EXTENSION = '.xml'