Commit 465a0d21 authored by Eckhart Arnold's avatar Eckhart Arnold
Browse files

- syntaxtree.py: Node,content now caches results

parent cf2daf3d
......@@ -9,7 +9,7 @@ The best (and easiest) way to contribute at this stage is to try to implement
a small DSL with DHParser and report bugs and problems and make suggestions
for further development. Have a look at the README.md-file to get started.
Please the code from the git repository. Because code still changes quickly,
Please, use the code from the git repository. Because code still changes quickly,
any prepackaged builds may be outdated. The repository is here:
https://gitlab.lrz.de/badw-it/DHParser
......@@ -58,10 +58,35 @@ that would raise an error message, if some parser matches at a place
where it really shouldn't. [Add some examples here.]
Optimizations
-------------
Optimization and Enhancement: Two-way-Traversal for AST-Transformation
----------------------------------------------------------------------
**Early discarding of nodes**:
AST-transformation are done via a depth-first tree-traversal, that is,
the traversal function first goes all the way up the tree to the leaf
nodes and calls the transformation routines successively on the way
down. The routines are picked from the transformation-table which is a
dictionary mapping Node's tag names to sequences of transformation functions.
The
rationale for depth-first is that it is easier to transform a node, if
all of its children have already been transformed, i.e. simplified.
However, there are quite a few cases where depth-last would be better.
For example if you know you are going to discard a whole branch starting
from a certain node, it is a waste to transform all the child nodes
first.
As the tree is traversed anyway, there no good reason why certain
transformation routines should not already be called on the way up.
Of course, as most routines
more or less assume depth first, we would need two transformation tables
one for the routines that are called on the way up. And one for the
routines that are called on the way down.
This should be fairly easy to implement.
Optimization: Early discarding of nodes
---------------------------------------
Reason: `traverse_recursive` and `Node.result-setter` are top time consumers!
Allow to specify parsers/nodes, the result of which
......
......@@ -203,10 +203,10 @@ class HistoryRecord:
FAIL = "FAIL"
Snapshot = collections.namedtuple('Snapshot', ['line', 'column', 'stack', 'status', 'text'])
COLGROUP = '<colgroup>\n<col style="width:2%"/><col style="width:2%"/><col style="width:75"/>' \
'<col style="width:6%"/><col style="width:15%"/>\n</colgroup>'
HEADINGS = ('<tr><th>L</th><th>C</th><th>parser calling sequence</th>'
'<th>success</th><th>text to parse</th></tr>')
COLGROUP = '<colgroup>\n<col style="width:2%"/><col style="width:2%"/><col ' \
'style="width:75%"/><col style="width:6%"/><col style="width:15%"/>\n</colgroup>'
HEADINGS = ('<tr><th>L</th><th>C</th><th>parser call sequence</th>'
'<th>success</th><th>text matched or failed</th></tr>')
HTML_LEAD_IN = ('<!DOCTYPE html>\n'
'<html>\n<head>\n<meta charset="utf-8"/>\n<style>\n'
'td,th {font-family:monospace; '
......@@ -289,7 +289,7 @@ class HistoryRecord:
@property
def stack(self) -> str:
return "->".join((p.repr if p.ptype == ':RegExp' else p.name or p.ptype)
return "->".join((p.repr if p.ptype in {':RegExp', ':PlainText'} else p.name or p.ptype)
for p in self.call_stack)
@property
......
......@@ -891,6 +891,9 @@ class PlainText(Parser):
return Node(self, self.text, True), text[self.len:]
return None, text
def __repr__(self):
return ("'%s'" if self.text.find("'") <= 0 else '"%s"') % self.text
class RegExp(Parser):
r"""
......
......@@ -425,9 +425,30 @@ located in the first column of the first line.
Unfortunately, DHParser - like almost any other parser out there - is not
always very good at spotting the exact location of an error. Because rules
refer to other rules, a rule may fail to parse - or, what is just as bad -
succeed to parse where it should indeed fail - as a consequence of an error in
the definition of one of the rule's it refers to. But this means, if the rule for the whole document fails for match, the error can be located anywhere in the document!
refer to other rules, a rule may fail to parse - or, what is just as bad,
succeed to parse when it should indeed fail - as a consequence of an error in
the definition of one of the rules it refers to. But this means if the rule
for the whole document fails to match, the actual error can be located
anywhere in the document! There a different approaches to dealing with this
problem. A tool that DHParser offers is to write log-files that document the
parsing history. The log-files allow to spot the location, where the parsing
error occured. However, you will have to look for the error manually. A good
starting point is usually either the end of the parsing process or the point
where the parser reached the farthest into the text. In order to receive the
parsing history, you need to run the compiler-script again with the debugging
option::
$ python poetryCompiler.py macbeth.dsl
You will receive the same error messages as before. but this time various kinds of debugging information have been written into a new created subdirectory "LOGS". (Beware that any files in the "LOGS" directory may be overwritten or deleted by any of the DHParser scripts upon the next run!
So don't store any important data there.) The most interesting file in the "LGOS"-directory is the full parser log. We'll ignore the other files and just open the file "macbech_full_parser.log.html" in an internet-browser. As the parsing history tends to become quite long, this usually takes a while, but luckily not in the case of our short demo example::
$ firefox LOGS/macbeth_full_parser.log.html &
..picture parsing_history.png
What you see is a representation of the parsing history. It might look a bit tedious in the beginning, especially the this column that contains the parser call sequence. But it is all very straight forward: For every application of a match rule, there is a row in the table. Typically, match rules are applied at the end of a long sequence of parser calls that is displayed in the thirs column.
The first two columns display the position in the text in terms of lines and columns.
Controlling abstract-syntax-tree generation
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment