Commit 235753d3 authored by di68kap's avatar di68kap
Browse files

Documentation erweitert

parent e9609524
"python.pythonPath": "venv/bin/python",
"python.pythonPath": "C:\\Users\\di68kap\\AppData\\Local\\Programs\\Python\\Python39\\python.exe",
"python.linting.pylintEnabled": true,
"python.linting.enabled": true,
"python.testing.nosetestArgs": [
......@@ -54,10 +54,8 @@ parser much like you'd compile a regular expresseion. Let's do this for a
HEX = /[0-9a-fA-F][0-9a-fA-F]/
INT = [NEG] ( /[1-9][0-9]+/ | /[0-9]/ )
NEG = `-`
FRAC = DOT /[0-9]+/
DOT = `.`
INT = [`-`] ( /[1-9][0-9]+/ | /[0-9]/ )
FRAC = `.` /[0-9]+/
EXP = (`E`|`e`) [`+`|`-`] /[0-9]+/
_EOF = !/./
......@@ -106,7 +104,7 @@ can be generated right inside a Python-program.
Nodes, the name of which starts with a colon ":" are nodes that have
been produced by an unnamed part of a parser, in this case the parts
that parse the quotation marks within the string-parser. Usually, such
nodes are either renamed or removed during abstract-syntaxtree-transformation.
nodes are either renamed or removed during abstract-syntax-tree-transformation.
The three lines starting with an ``@``-sign at the beginning of the
grammar-string are DHParser-directives (see :py:mod:`ebnf`) which
......@@ -125,10 +123,8 @@ instead of compiling an EBNF-grammar first::
_dwsp = Drop(Whitespace(r'\s*'))
_EOF = NegativeLookahead(RegExp('.'))
EXP = (Text("E") | Text("e") + Option(Text("+") | Text("-")) + RegExp(r'[0-9]+')).name('EXP')
DOT = Text(".").name('DOT')
FRAC = (DOT + RegExp(r'[0-9]+')).name('FRAC')
NEG = Text("-").name('NEG')
INT = (Option(NEG) + RegExp(r'[1-9][0-9]+') | RegExp(r'[0-9]')).name('INT')
FRAC = (Text(".") + RegExp(r'[0-9]+')).name('FRAC')
INT = (Option(Text("-")) + RegExp(r'[1-9][0-9]+') | RegExp(r'[0-9]')).name('INT')
HEX = RegExp(r'[0-9a-fA-F][0-9a-fA-F]').name('HEX')
UNICODE = (DTKN("\\u") + HEX + HEX).name('unicode')
ESCAPE = (RegExp('\\\\[/bnrt\\\\]') | UNICODE).name('ESCAPE')
......@@ -222,7 +218,7 @@ with a name of a project-directory that will then be created and filled with som
$ dhparser JSON
$ cd JSON
$ dir
example.dsl JSON.ebnf tests_grammar
example.dsl JSON.ebnf tests_grammar
The first step is to replace the ".ebnf"-file that contains a simple demo-grammar with your
own grammar. For the sake of the example we'll write our json-Grammar into this file::
......@@ -235,7 +231,7 @@ own grammar. For the sake of the example we'll write our json-Grammar into this
@drop = whitespace, strings # silently drop bare strings and whitespace
@disposable = /_\w+/ # regular expression to identify disposable symbols
#: compound elememts
#: compound elements
json = ~ _element _EOF
_element = object | array | string | number | _bool | null
......@@ -260,10 +256,8 @@ own grammar. For the sake of the example we'll write our json-Grammar into this
HEX = /[0-9a-fA-F][0-9a-fA-F]/
INT = [NEG] ( /[1-9][0-9]+/ | /[0-9]/ )
NEG = `-`
FRAC = DOT /[0-9]+/
DOT = `.`
INT = [`-`] ( /[1-9][0-9]+/ | /[0-9]/ )
FRAC = `.` /[0-9]+/
EXP = (`E`|`e`) [`+`|`-`] /[0-9]+/
_EOF = !/./
......@@ -276,7 +270,7 @@ The ````-script is the most important tool in any DSL-project.
The script generates or updates the ````-program if the grammar
has changed and runs the unit tests in the ``tests_grammar`` subdirectory.
After filling in the above grammar in the ``json.ebnf``-file, a parser can
be generated by running the test skript::
be generated by running the test script::
$ python
......@@ -331,8 +325,8 @@ To reach this goal DHParser follows a few, mostly intuitive, conventions:
grammar clear of too many whitespace markers.
In case you want to grab a string without
eating its adjacent whitespace, you can still use the "backticked"
notation for string literals ```backticked string```.
eating its adjacent whitespace, you can still use the "backt-icked"
notation for string literals ```back-ticked string```.
6. DHParser can be advised (vie the ``@drop``-directive) to drop
string-tokens completely from the syntax-tree and, likewise,
......@@ -527,13 +521,93 @@ Test-driven grammar development
Just like regular expressions, it is quite difficult to get
EBNF-grammars right on the first try - especially, if you are
new to the technology. For regular expressions there exist
all kinds of "workbenches" to try and test regular expressions.
- Debugging parsers
new to the technology. DHParser offers a unit-testing
environment and a dbugger for EBNF-grammars which
is helpful when learning to work with parser-technology
and almost indispensable when refactoring the grammar of
evolving DSLs.
This unit-testing system is quite simple to handle: Tests
for any symbol of the grammar are written into ``.ini``-Files
in the ``tests_grammar`` sub-directory of the DSL-project.
Test-cases look like this::
M1: "-3.2E-32"
M2: "42"
Here, we test, whether the parser "number" really matches the
given strings as we would expect. "M1" and "M2" are arbitrary
names for the individual test-cases. Since parsers should not
only match strings that conform to the grammar of that
parser, but must also fail to match strings that don't, it
is also possible to specify "fail-tests"::
F1: "π"
Running the ````-script on a test-file
the test-directory yields the results of those tests::
$ python tests_grammar/02_simple_elements.ini
GRAMMAR TEST UNIT: 02_test_simple_elements
Match-Tests for parser "number"
match-test "M1" ... OK
match-test "M2" ... OK
Fail-Tests for parser "number"
fail-test "F1" ... OK
SUCCESS! All tests passed :-)
In addition to this summary-report the test-script stores
detailed reports of all tests for each test-file into
Markdown-documents in the "test_grammar/REPORTS" directory.
These reports contain the ASTs of all matches and the
error messages for all fail-tests. If we look at the
AST of the first match-test "M1" we might find to our
surprise that it is not what we expect, but much more verbose::
(number (INT (NEG "-") (:RegExp "3"))
(FRAC (DOT ".") (:RegExp "2"))
(EXP (:Text "E") (:Text "-") (:RegExp "32")))
None, of these details are really needed in an abstract syntax-tree.
Luckily, ASTs can also be tested for, which allows to develop
AST-generation in a test driven manner. We simply need to add
an AST-Test to the grammar with the same name as the match-test
that yields the AST we'd like to test::
M1: (number "-3.2E-32")
Running the test-suite will, of course, yield a failure for the
AST-Test until we fix the issue, which in this case could be done
by adding ``"number": [collapse]`` to our AST-transformations.
Since it is sometimes helpful to inspect the CST as well, a
match test's name can be marked with an asterix, e.g.
``M1*: "-3.2E-32"`` to include the CST for this test in the
report, too.
If a parser fails to match it is sometimes hard to tell, what
mistake in the grammar definition has been responsible for that
failure. DHParser's testing-framwork therefore includes a
post-mortem debugger that delivers a detailed account of the
parsing process up to the failure. These accounts will be
written in HTML-format into the ``test_grammar/LOGS``-subdirectory
and can be viewed with a browser.
To see what this looks like, let's introduce a little mistake
into our grammar, let's assume that we had forgotten that
the exponent of a decimal number can also be introduced by
a capital letter "E": ``EXP = `e` [`+`|`-`] /[0-9]+/``.
.. image:: debugger_snippet.png
:alt: a screenshot of DHParser's post-mortem-debugger
While error messages help to locate errors in the source
text, the grammar-debugger helps to locate the cause of
an error that is not due to a faulty source text but a
faulty grammar in the grammar.
Fail-tolerant parsing
......@@ -542,11 +616,6 @@ Fail-tolerant parsing
Compiling DSLs
- XML-Connection
Language Servers
# EBNF-Directives
@literalws = right # eat insignificant whitespace to the right of literals
@whitespace = /\s*/ # regular expression for insignificant whitespace
@comment = /(?:\/\/.*)|(?:\/\*(?:.|\n)*?\*\/)/ # C++ style comments
@drop = whitespace, strings # silently drop bare strings and whitespace
@disposable = /_\w+/ # regular expression to identify disposable symbols
#: compound elements
json = ~ _element _EOF
_element = object | array | string | number | _bool | null
object = "{" member { "," §member } §"}"
member = string §":" _element
array = "[" [ _element { "," _element } ] §"]"
#: simple elements
string = `"` §_CHARACTERS `"` ~
number = INT [ FRAC ] [ EXP ] ~
_bool = true | false
true = `true` ~
false = `false` ~
null = "null"
#: atomic expressions types
PLAIN = /[^"\\]+/
ESCAPE = /\\[\/bnrt\\]/ | UNICODE
HEX = /[0-9a-fA-F][0-9a-fA-F]/
INT = [`-`] ( /[1-9][0-9]+/ | /[0-9]/ )
FRAC = `.` /[0-9]+/
EXP = (`E`|`e`) [`+`|`-`] /[0-9]+/
_EOF = !/./
This diff is collapsed.
## License
JSON is open source software under the [Apache 2.0 License](
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
See the License for the specific language governing permissions and
limitations under the License.
#!/usr/bin/env python3
""" - runs the unit tests for the JSON-grammar
import os
import sys
DEBUG = True
TEST_DIRNAME = 'tests_grammar'
scriptpath = os.path.dirname(__file__)
dhparserdir = os.path.abspath(os.path.join(scriptpath, '..', '..'))
if scriptpath not in sys.path:
if dhparserdir not in sys.path:
from DHParser.configuration import access_presets, set_preset_value, \
from DHParser import dsl
import DHParser.log
from DHParser import testing
except ModuleNotFoundError:
print('Could not import DHParser. Please adjust sys.path in file '
'"%s" manually' % __file__)
def recompile_grammar(grammar_src, force):
grammar_tests_dir = os.path.join(scriptpath, TEST_DIRNAME)
testing.create_test_templates(grammar_src, grammar_tests_dir)
# recompiles Grammar only if it has changed
if not dsl.recompile_grammar(grammar_src, force=force,
notify=lambda: print('recompiling ' + grammar_src)):
print('\nErrors while recompiling "%s":' % grammar_src +
with open('JSON_ebnf_ERRORS.txt', encoding='utf-8') as f:
def run_grammar_tests(glob_pattern, get_grammar, get_transformer):
testdir = os.path.join(scriptpath, TEST_DIRNAME)
DHParser.log.start_logging(os.path.join(testdir, LOGGING))
error_report = testing.grammar_suite(
testdir, get_grammar, get_transformer,
fn_patterns=[glob_pattern], report='REPORT', verbose=True)
return error_report
if __name__ == '__main__':
argv = sys.argv[:]
if len(argv) > 1 and sys.argv[1] == "--debug":
DEBUG = True
del argv[1]
# set_preset_value('test_parallelization', True)
if DEBUG: set_preset_value('history_tracking', True)
if (len(argv) >= 2 and (argv[1].endswith('.ebnf') or
os.path.splitext(argv[1])[1].lower() in testing.TEST_READERS.keys())):
# if called with a single filename that is either an EBNF file or a known
# test file type then use the given argument
arg = argv[1]
# otherwise run all tests in the test directory
arg = '*_test_*.ini'
if arg.endswith('.ebnf'):
recompile_grammar(arg, force=True)
recompile_grammar(os.path.join(scriptpath, 'JSON.ebnf'),
from JSONParser import get_grammar, get_transformer
error_report = run_grammar_tests(arg, get_grammar, get_transformer)
if error_report:
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment