Commit cd1a166b authored by Eckhart Arnold's avatar Eckhart Arnold
Browse files

syntaxtree.py: docs extended

parent 2cd7c1fa
......@@ -235,12 +235,10 @@ particular nodes within in a tree::
[Node('blank', ' '), Node('blank', ' '), Node('blank', ' ')]
The pick functions always picks the first node fulfilling the criterion::
>>> sentence.pick('word')
Node('word', 'This')
Or, reversing the direction::
>>> last_match = sentence.pick('word', reverse=True)
>>> last_match
Node('word', 'Palace')
......@@ -252,6 +250,68 @@ executed ony ancestor of the node::
>>> sentence.find_parent(last_match)
Node('phrase', (Node('word', 'Buckingham'), Node('blank', ' '), Node('word', 'Palace')))
Sometimes, one only wants to select or pick particular children of a node.
Apart from accessing these via `node.children`, there is a tuple-like
access to the immediate children via indices and slices::
>>> sentence[0]
Node('word', 'This')
>>> sentence[-1]
Node('phrase', (Node('word', 'Buckingham'), Node('blank', ' '), Node('word', 'Palace')))
>>> sentence[0:3]
(Node('word', 'This'), Node('blank', ' '), Node('word', 'is'))
>>> sentence.index('blank')
1
>>> sentence.indices('word')
(0, 2)
as well as a dictionary-like access, with the difference that a "key" may
occur several times::
>>> sentence['word']
(Node('word', 'This'), Node('word', 'is'))
>>> sentence['phrase']
Node('phrase', (Node('word', 'Buckingham'), Node('blank', ' '), Node('word', 'Palace')))
Be aware that always all matching values will be returned and that the return
type can accordingly be either a tuple of Nodes or a single Node! An IndexError
is raised in case the "key" does not exist or an index is out of range.
It is also possible to delete children conveniently with Python's `del`-operator::
>>> s_copy = copy.deepcopy(sentence)
>>> del s_copy['blank']; print(s_copy)
ThisisBuckingham Palace
>>> del s_copy[2][0:2]; print(s_copy.serialize())
(sentence (word "This") (word "is") (phrase (word "Palace")))
One can also use the `Node.pick_child()` or `Node.select_children()`-method in
order to select children with an arbitrary condition::
>>> tuple(sentence.select_children(lambda nd: nd.content.find('s') >= 0))
(Node('word', 'This'), Node('word', 'is'))
>>> sentence.pick_child(lambda nd: nd.content.find('i') >= 0, reverse=True)
Node('phrase', (Node('word', 'Buckingham'), Node('blank', ' '), Node('word', 'Palace')))
Often, one is neither interested in selecting form the children of a node, nor
from the entire subtree, but from a certain "depth-range" of a tree-structure.
Say, you would like to pick all word's from the sentence that are not inside
a phrase and assume at the same time that words may occur in nested structures::
>>> nested = copy.deepcopy(sentence)
>>> i = nested.index(lambda nd: nd.content == 'is')
>>> nested[i].result = Node('word', nested[i].result)
>>> nested[i].tag_name = 'italic'
>>> nested[0:i + 1]
(Node('word', 'This'), Node('blank', ' '), Node('italic', (Node('word', 'is'))))
No, in order to select all words on the level of the sentence, but excluding
any sub-phrases, it would not be helpful to use methods based on the selection
of children (i.e. immediate descendents), because the word nested in an
'italic'-Node would be missed. For this purpose the various selection()-methods
of class node have a `skip_subtree`-parameter which can be used to block subtrees
from the iterator based on a criteria (which can be a function, a tag name or
set of tag names and the like)::
>>> tuple(nested.select('word', skip_subtree='phrase'))
(Node('word', 'This'), Node('word', 'is'))
Navigating "uptree" within the neighborhood and lineage of a node
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -1576,7 +1636,7 @@ class Node: # (collections.abc.Sized): Base class omitted for cython-compatibil
def pick_context(self, criterion: CriteriaType,
include_root: bool = False,
reverse: bool = False,
skip_subtree: ContextMatchFunction = NO_CONTEXTS) -> TreeContext:
skip_subtree: CriteriaType = NO_CONTEXTS) -> TreeContext:
"""
Like :py:meth:`Node.pick()`, only that the entire context (i.e.
chain of descendants) relative to `self` is returned.
......
*************************
DHParser Reference Manual
*************************
This reference manual explains the technology used by DHParser. It is
intended for people who would like to extend or contribute to
DHParser. The reference manual does not explain how a Domain Specific
Language (DSL) is developed (see the User's Manual for that). It it
explains the technical approach that DHParser employs for parsing,
abstract syntax tree transformation and compilation of a given
DSL. And it describes the module and class structure of the DHParser
Software. The programming guide requires a working knowledge of Python
programming and a basic understanding or common parser technology from
the reader. Also, it is recommended to read the introduction and the
user's guide first.
Fundamentals
============
DHParser is a parser generator aimed at but not restricted to the
creation of domain specific languages in the Digital Humanities (DH),
hence the name "DHParser". In the Digital Humanities, DSLs allow to
enter annotated texts or data in a human friendly and readable form
with a Text-Editor. In contrast to the prevailing XML-approach, the
DSL-approach distinguishes between a human-friendly *editing data
format* and a maschine friendly *working data format* which can be XML
but does not need to be. Therefore, the DSL-approach requires an
additional step to reach the *working data format*, that is, the
compilation of the annotated text or data written in the DSL (editing
data format) to the working data format. In the following a text or
data file wirtten in a DSL will simply be called *document*. The
editing data format will also be called *source format* and the
working data format be denoted as *target format*.
Compiling a document specified in a domain specific language involves the following steps:
1. **Parsing** the document which results in a representation of the document as a concrete
syntax tree.
2. **Transforming** the concrete syntax tree (CST) into an abstract syntax tree (AST), i.e. a
streamlined and simplified syntax tree ready for compilation.
3. **Compiling** the abstract syntax tree into the working data format.
All of these steps a carried out be the computer without any user intervention, i.e. without the
need of humans to rewrite or enrich the data during any these steps. A DSL-compiler therefore
consists of three components which are applied in sequence, a *parser*, a *transformer* and a
*compiler*. Creating, i.e. programming these components is the task of compiler construction.
The creation of all of these components is supported by DHParser, albeit to a different degree:
1. *Creating a parser*: DHParser fully automizes parser generation. Once the syntax of the DSL
is formally specified, it can be compiled into a python class that is able to parse any
document written in the DSL. DHParser uses Parsing-Expression-Grammars in a variant of the
Extended-Backus-Naur-Form (EBNF) for the specification of the syntax. (See
`examples/EBNF/EBNF.ebnf` for an example.)
2. *Specifying the AST-transformations*: DHParser supports the AST-transformation with a
depth-first tree traversal algorithm (see `DHParser.transform.traverse` ) and a number of
stock transformation functions which can also be combined. Most of the AST-transformation is
specified in a declarative manner by filling in a transformation-dictionary which associates
the node-types of the concrete syntax tree with such combinations of transformations. See
`DHParser.ebnf.EBNF_AST_transformation_table` as an example.
3. *Filling in the compiler class skeleton*: Compiler generation cannot be automated like parser
generation. It is supported by DHParser merely by generating a skeleton of a compiler class
with a method-stub for each definition (or "production" as the definition are sometimes also
called) of the EBNF-specification. (See `examples/EBNF/EBNFCompiler.py`) If the target format
is XML, there is a chance that the XML can simply be generated by serializing the abstract
syntax tree as XML without the need of a dedicated compilation step.
Compiler Creation Workflow
==========================
TODO: Describe:
- setting up a new projekt
- invoking the DSL Compiler
- conventions and data types
- the flat namespace of DH Parser
Component Guide
===============
Parser
------
Parser-creation if supported by DHParser by an EBNF to Python compiler which yields a working
python class that parses any document the EBNF-specified DSL to a tree of Node-objects, which
are instances of the `class Node` defined in `DHParser/snytaxtree.py`
The EBNF to Python compiler is actually a DSL-compiler that has been crafted with DHParser
itself. It is located in `DHParser/enbf.py`. The formal specification of the EBNF variant
used by DHParser can be found in `examples/EBNF/EBNF.ebnf`. Comparing the automatically
generated `examples/EBNF/EBNFCompiler.py` with `DHParser/ebnf.py` can give you an idea what
additional work is needed to create a DSL-compiler from an autogenerated DSL-parser. In most
DH-projects this task will be less complex, however, as the target format is XML which
usually can be derived from the abstract syntax tree with fewer steps than the Python code in
the case of DHParser's EBNF to Python compiler.
AST-Transformation
------------------
Other than for the compiler generation (see the next point below), a functional rather than
object-oriented approach has been employed, because it allows for a more concise
specification of the AST-transformation since typically the same combination of
transformations can be used for several node types of the AST. It would therefore be tedious
to fill in a method for each of these. In a sense, the specification of AST-transformation
constitutes an "internal DSL" realized with the means of the Python language itself.
Compiler
--------
Module Structure of DHParser
============================
Class Hierarchy of DHParser
===========================
......@@ -13,10 +13,7 @@ Welcome to DHParser's documentation!
:caption: Contents:
StepByStepGuide.rst
UserGuide.rst
ReferenceManual.rst
ModuleReference.rst
Manual.rst
Indices and tables
==================
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment