Commit 628adcb1 authored by di68kap's avatar di68kap
Browse files

- small corrections in "StepByStep.rst"

parent 27294258
......@@ -33,7 +33,7 @@ installing the packages from the Python Package Index (PyPI).
This section takes you from cloning the DHParser git repository to setting up
a new DHParser-project in the ``experimental``-subdirectory and testing
whether the setup works. Similarily to current web development practices, most
whether the setup works. Similarly to current web development practices, most
of the work with DHParser is done from the shell. In the following, we assume
a Unix (Linux) environment. The same can most likely be done on other
operating systems in a very similar way, but there might be subtle
......@@ -65,7 +65,7 @@ files and directories for sure, but those will not concern us for now::
In order to verify that the installation works, you can simply run the
"dhparser.py" script and, when asked, chose "3" for the self-test. If the
self-test runs through without error, the installation has succeded.
self-test runs through without error, the installation has succeeded.
Staring a new DHParser project
------------------------------
......@@ -148,9 +148,9 @@ Generally speaking, the compilation process consists of three stages:
Now, DHParser can fully automize the generation of a parser from a
syntax-description in EBNF-form, like our "poetry.ebnf", but it cannot
automize the transformation from the concrete into the abstract syntax tree
automatize the transformation from the concrete into the abstract syntax tree
(which for the sake of brevity we will simply call "AST-Transformation" in the
following), and neither can it automize the compilation of the abstract syntax
following), and neither can it automatize the compilation of the abstract syntax
tree into something more useful. Therefore, the AST-Transformation in the
autogenerated compile-script is simply left empty, while the compiling stage
simply converts the syntax tree into a pseudo-XML-format.
......@@ -164,7 +164,7 @@ This also means that the parser part of this script will be overwritten and
should never be edited by hand. The other two stages can and should be edited
by hand. Stubs for theses parts of the compile-script will only be generated
if the compile-script does not yet exist, that is, on the very first calling
of the test-srcipt.
of the test-script.
Usually, if you have adjusted the grammar, you will want to run the unit tests
anyway. Therefore, the regeneration of the parser-part of the compile-script
......@@ -181,13 +181,13 @@ revising the AST-transformations and the compiler time and again it helps if
the grammar has been worked out before. A bit of interlocking between these
steps does not hurt, though.
A resonable workflow for developing the grammar proceeds like this:
A reasonable workflow for developing the grammar proceeds like this:
1. Set out by writing down a few example documents for your DSL. It is
advisable to start with a few simple examples that use only a subset of the
intended features of your DSL.
2. Next you sktech a grammar for your DSL that is just rich enough to capture
2. Next you sketch a grammar for your DSL that is just rich enough to capture
those examples.
3. Right after sketching the grammar you should write test cases for your
......@@ -249,7 +249,7 @@ Now, there are exactly three rules that make up this grammar::
EOF = !/./
EBNF-Grammars describe the structure of a domain specific notation in top-down
fashion. Thus, the first rule in the grammar describes the comonents out of
fashion. Thus, the first rule in the grammar describes the components out of
which a text or document in the domain specific notation is composed as a
whole. The following rules then break down the components into even smaller
components until, finally, there a only atomic components left which are
......@@ -257,7 +257,7 @@ described be matching rules. Matching rules are rules that do not refer to
other rules any more. They consist of string literals or regular expressions
that "capture" the sequences of characters which form the atomic components of
our DSL. Rules in general always consist of a symbol on the left hand side of
a "="-sign (which in this context can be unterstood as a definition signifier)
a "="-sign (which in this context can be understood as a definition signifier)
and the definition of the rule on the right hand side.
.. note:: Traditional parser technology for context-free grammars often
......@@ -299,7 +299,7 @@ expressions. If you do not know about regular expressions yet, you should head
over to an explanation or tutorial on regular expressions, like
https://docs.python.org/3/library/re.html, before continuing, because we are
not going to discuss them here. In DHParser-Grammars regular expressions are
enclosed by simple forawrd slashes "/". Everything between two forward slashes
enclosed by simple forward slashes "/". Everything between two forward slashes
is a regular expression as it would be understood by Python's "re"-module.
Thus the rule ``WORD = /\w+/~`` means that a word consists of a sequence of
letters, numbers or underscores '_' that must be at least one sign long. This
......@@ -335,10 +335,10 @@ again::
$ python tst_poetry_grammar.py
But what is that? A whole lot of errormessages? Well, this it not surprising,
But what is that? A whole lot of error messages? Well, this it not surprising,
because we change the grammar, some of our old test-cases fail with the new
grammar. So we will have to update our test-cases wird. (Actually, the grammar
get's compiles never the less and we could just ignore the test failures and
grammar. So we will have to update our test-cases. Actually, the grammar
gets compiled never the less and we could just ignore the test failures and
carry on with compiling our "example.dsl"-file again. But, for this time,
we'll follow good practice and adjust the test cases. So open the test that
failed, "grammar_tests/02_test_document.ini", in the editor and add full stops
......@@ -361,9 +361,9 @@ following examples should be matched by the grammar-rule. "fail" means they
should *not* match. It is just as important that a parser or grammar-rules
does not match those strings it should not match as it is that it matches
those strings that it should match. The individual test-cases all get a name,
in this case M1, M2, F2, but if you prefer more meaningful names this is also
possible. (Beware, however, that the names for match-test different from the
names for the fail tests for the same rule!). Now, run the test-script again
in this case M1, M2, F1, but if you prefer more meaningful names this is also
possible. (Beware, however, that the names for the match-tests must be different from the
names for the fail-tests for the same rule!). Now, run the test-script again
and you'll see that no errors get reported any more.
Finally, we can recompile out "example.dsl"-file, and by its XML output we can
......@@ -373,7 +373,7 @@ tell that it worked::
So far, we have seen *in nuce* how the development workflow for a building up
DSL-grammar goes. Let's take this a step further by adding more capabilities
to our grammr.
to our grammar.
Extending the example DSL further
---------------------------------
......@@ -406,7 +406,7 @@ stop. (Understandable? If you have ever read Russell's "Introduction to
Mathematical Philosophy" you will be used to this kind of prose. Other than
that I find the formal definition easier to understand. However, for learning
EBNF or any other formalism, it helps in the beginning to translate the
meaning of its statements into plain old Englisch.)
meaning of its statements into plain old English.)
There is are two subtle mistakes in this grammar. If you can figure them out
just by thinking about it, feel free to correct the grammar right now. (Would
......@@ -444,7 +444,7 @@ sentences have been defined, we find that the rule::
part = { WORD }
definies a part of a sentence as a sequence of *zero* or more WORDs. This
defines a part of a sentence as a sequence of *zero* or more WORDs. This
means that a string of length zero also counts as a valid part of a sentence.
Now in order to avoid this, we could write::
......@@ -457,7 +457,7 @@ DHParser offers a special syntax for this case::
part = { WORD }+
(The plus sign "+" must always follow directly after the curly brace "}"
without any whitespcae in between, otherwise DHParser won't understannd it.)
without any whitespace in between, otherwise DHParser won't understannd it.)
At this point the worry may arise that the same problem could reoccur at
another level, if the rule for WORD would match empty strings as well. Let's
quickly add a test case for this to the file
......@@ -469,7 +469,7 @@ quickly add a test case for this to the file
Thus, we are sure to be warned in case the definition of rule "WORD" matches
the empty string. Luckily, it does not do so now. But it might happen that we
change this definition later again for some reason, we might have forgotton
change this definition later again for some reason, we might have forgotten
about this subtlety and introduce the same error again. With a test case we
can reduce the risk of such a regression error. This time the tests run
through, nicely. So let's try the parser on our new example::
......@@ -478,7 +478,7 @@ through, nicely. So let's try the parser on our new example::
macbeth.dsl:1:1: Error: EOF expected; "Life’s but" found!
That is strange. Obviously, there is an error right at the beginning (line 1
column 1). But what coul possibly be wrong with the word "Life". Now you might
column 1). But what could possibly be wrong with the word "Life". Now you might
already have guessed what the error is and that the error is not exactly
located in the first column of the first line.
......@@ -491,7 +491,7 @@ for the whole document fails to match, the actual error can be located
anywhere in the document! There a different approaches to dealing with this
problem. A tool that DHParser offers is to write log-files that document the
parsing history. The log-files allow to spot the location, where the parsing
error occured. However, you will have to look for the error manually. A good
error occurred. However, you will have to look for the error manually. A good
starting point is usually either the end of the parsing process or the point
where the parser reached the farthest into the text. In order to receive the
parsing history, you need to run the compiler-script again with the debugging
......@@ -505,7 +505,7 @@ subdirectory "LOGS". (Beware that any files in the "LOGS" directory may be
overwritten or deleted by any of the DHParser scripts upon the next run! So
don't store any important data there.) The most interesting file in the
"LGOS"-directory is the full parser log. We'll ignore the other files and just
open the file "macbech_full_parser.log.html" in an internet-browser. As the
open the file "macbeth_full_parser.log.html" in an internet-browser. As the
parsing history tends to become quite long, this usually takes a while, but
luckily not in the case of our short demo example::
......@@ -514,14 +514,14 @@ luckily not in the case of our short demo example::
.. image:: parsing_history.png
What you see is a representation of the parsing history. It might look a bit
tedious in the beginning, especially the this column that contains the parser
tedious in the beginning, especially the column that contains the parser
call sequence. But it is all very straight forward: For every application of a
match rule, there is a row in the table. Typically, match rules are applied at
the end of a long sequence of parser calls that is displayed in the third
column. You will recognise the parsers that represent rules by their names,
e.g. "document", "sentence" etc. Those parsers that merely represent
constructs of the EBNF grammar within a rule do not have a name and are
represented by theis type, which always begins with a colon, like
represented by this type, which always begins with a colon, like
":ZeroOrMore". Finally, the regular expression or literal parsers are
represented by the regular expression pattern or the string literal
themselves. (Arguably, it can be confusing that parsers are represented in
......@@ -538,7 +538,7 @@ the text that still lies ahead and has not yet been parsed.
In our concrete example, we can see that the parser "WORD" matches "Life", but
not "Life’s" or "’s". And this ultimately leads to the failure of the parsing
process as a whole. The simplemost solution would be to add the apostrophe to
the list of allowed characters in a word by changeing the respective line in
the list of allowed characters in a word by changing the respective line in
the grammar definition to ``WORD = /[\w’]+/``. Now, before we even change the
grammar we first add another test case to capture this kind of error. Since we
have decided that "Life’s" should be parsed as a singe word, let's open the
......@@ -604,7 +604,7 @@ suffice to have the word without whitespace in our tree, and to add whitespace
only later when transforming the tree into some kind of output format. (On the
other hand, it might be convenient to have it in the tree never the less...)
Well, the answer to most most of these questions is that what our compilation
Well, the answer to most of these questions is that what our compilation
script yields is more or less the output that the parser yields which in turn
is the *concrete syntax tree* of the parsed text. Being a concrete syntax tree
it is by its very nature very verbose, because it captures every minute
......@@ -616,15 +616,15 @@ all details that deem us irrelevant. Now, which details we consider as
irrelevant is almost entirely up to ourselves. And we should think carefully
about what features must be included in the abstract syntax tree, because the
abstract syntax tree more or less reflects the data model (or is at most one
step away from it) with which want to capture our material.
step away from it) with which we want to capture our material.
For the sake of our example, let's assume that we are not interested in
whitespace and that we want to get rid of all uniformative nodes, i.e. nodes
whitespace and that we want to get rid of all uninformative nodes, i.e. nodes
that merely demark syntactic structures but not semantic entities.
DHParser supports the transformation of the concrete syntax tree (CST) into the
abstract syntax tree (AST) with a simple technology that (in theory) allows to
specify the necessary transformations in an almost delcarative fashion: You
specify the necessary transformations in an almost declarative fashion: You
simply fill in a Python-dictionary of tag-names with transformation *operators*.
Technically, these operators are simply Python-functions. DHParser comes with a
rich set of predefined operators. Should these not suffice, you
......@@ -672,7 +672,7 @@ what this means and how this works, briefly.
.. caution:: Once the compiler-script "xxxxCompiler.py" has been generated, the
*only* part that is changed after editing and extending the grammar is the
parser-part of this script (i.e. the class derived from class Grammar),
because this part is completly auto-generated and can therefore be
because this part is completely auto-generated and can therefore be
overwritten safely. The other parts of that script, including the
AST-transformation-dictionary, if never changed once it has been generated,
because it needs to be filled in by hand by the designer of the DSL and the
......@@ -686,8 +686,8 @@ what this means and how this works, briefly.
which cannot.
We can either specify no operator (empty list), a single operator or a list of
operators for transforming a node. There is a different between specifying an
empty list for a particular tag-name or leaving out a tag-name completly. In the
operators for transforming a node. There is a difference between specifying an
empty list for a particular tag-name or leaving out a tag-name completely. In the
latter case the "*"-joker is applied, in place of the missing list of operators.
In the former case only the "+"-joker is applied. If a list of operators is
specified, these operator will be applied in sequence one after the other. We
......@@ -695,12 +695,12 @@ also call the list of operators or the single operator if there is only one the
*transformation* for a particular tag (or parser name or parser type for that
matter).
Because the AST-transfomation works through the table from the inside to the
Because the AST-transformation works through the table from the inside to the
outside, it is reasonable to do the same when designing the AST-transformations,
to proceed in the same order. The innermost nodes that concern us are the nodes
captured by the <WORD>-parser, or simply, <WORD>-nodes. As we can see, these
nodes usually contain a <:RegExp>-node and a <:Whitespace>-node. As the "WORD"
parser is defined as a simple regular expresseion with followed by optional
parser is defined as a simple regular expression with followed by optional
whitespace in our grammar, we now that this must always be the case, although
the whitespace may occasionally be empty. Thus, we can eliminate the
uninformative child nodes by removing whitespace first and the reducing the
......@@ -730,7 +730,7 @@ Running the "poetryCompiler.py"-script on "macbeth.dsl" again, yields::
<WORD>a</WORD>
...
It starts to become more readble and concise, but there are sill some oddities.
It starts to become more readable and concise, but there are sill some oddities.
Firstly, the Tokens that deliminate parts of sentences still contain whitespace.
Secondly, if several <part>-nodes follow each other in a <sentence>-node, the
<part>-nodes after the first one are enclosed by a <:Series>-node or even a
......@@ -765,7 +765,7 @@ that depending on the occasion and purpose, such decisions can also be taken
otherwise.)
The only kind of nodes left are the <document>-nodes. In the output of the
compiler-script (see above), the <document>-node had a single childe node
compiler-script (see above), the <document>-node had a single child-node
":ZeroOrMore". Since this child node does not have any particular semantic
meaning it would reasonable to eliminate it and attach its children directly to
"document". We could do so by entering ``reduce_single_child`` in the lost of
......@@ -786,7 +786,7 @@ node, in which case the "reduce_single_child"-operator would do nothing, because
there is more than a single child. (We could of course also use the
"flatten"-operator, instead. Try this as an exercise.) Test cases help to
capture those different scenarios, so adding test cases and examining the output
in the test report halp to get a grip on this, if just looking at the grammar
in the test report help to get a grip on this, if just looking at the grammar
strains you imagination too much.
Since we have decided, that we do not want to include whitespace in our data
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment