Commit 5aa3acd2 authored by eckhart's avatar eckhart

some documentation added (still a stub)

parent c90818b7
......@@ -145,7 +145,8 @@ class EBNFGrammar(Grammar):
def grammar_changed(grammar_class, grammar_source: str) -> bool:
"""Returns ``True`` if ``grammar_class`` does not reflect the latest
"""
Returns ``True`` if ``grammar_class`` does not reflect the latest
changes of ``grammar_source``
Parameters:
......
......@@ -583,7 +583,7 @@ class Node(collections.abc.Sized):
def log(self, log_file_name):
"""
Writes ab S-expressions of the tree with root `self` to a file.
Writes an S-expression-representation of the tree with root `self` to a file.
"""
if is_logging():
path = os.path.join(log_dir(), log_file_name)
......
......@@ -17,4 +17,4 @@ permissions and limitations under the License.
"""
__all__ = ('__version__',)
__version__ = '0.7.8' # + '_dev' + str(os.stat(__file__).st_mtime)
__version__ = '0.7.9' # + '_dev' + str(os.stat(__file__).st_mtime)
This diff is collapsed.
......@@ -7,7 +7,6 @@ specific languages (DSL) in Digital Humanities projects.
Author: Eckhart Arnold, Bavarian Academy of Sciences
Email: arnold@badw.de
License
-------
......@@ -32,18 +31,16 @@ Python 3.5 source code in order for DHParser to be backwards compatible
with Python 3.4. The module ``DHParser/foreign_typing.py`` is licensed under the
[Python Software Foundation License Version 2](https://docs.python.org/3.5/license.html)
Sources
-------
Find the sources on [gitlab.lrz.de/badw-it/DHParser](https://gitlab.lrz.de/badw-it/DHParser) .
Find the sources on [gitlab.lrz.de/badw-it/DHParser](https://gitlab.lrz.de/badw-it/DHParser) .
Get them with:
git clone https://gitlab.lrz.de/badw-it/DHParser
Please contact me, if you are intested in contributing to the
development or just using DHParser.
development or just using DHParser.
Disclaimer
----------
......@@ -55,14 +52,13 @@ function names changed in future versions. The API is NOT YET STABLE!
Use it for testing an evaluation, but not in an production environment
or contact me first, if you intend to do so.
Purpose
-------
DHParser leverages the power of Domain specific languages for the
DHParser leverages the power of Domain specific languages for the
Digital Humanities.
Domain specific languages are widespread in
Domain specific languages are widespread in
computer sciences, but seem to be underused in the Digital Humanities.
While DSLs are sometimes introduced to Digital-Humanities-projects as
[practical adhoc-solution][Müller_2016], these solutions are often
......@@ -76,17 +72,17 @@ parser generators, but employs the more modern form called
recursive descent parser.
Why another parser generator? There are plenty of good parser
generators out there, e.g. [Añez's grako parser generator][Añez_2017],
generators out there, e.g. [Añez's grako parser generator][Añez_2017],
[Eclipse XText][XText_Website]. However, DHParser is
intended as a tool that is specifically geared towards digital
humanities applications, while most existing parser generators come
from compiler construction toolkits for programming languages.
While I expect DSLs in computer science and DSLs in the Digital
from compiler construction toolkits for programming languages.
While I expect DSLs in computer science and DSLs in the Digital
Humanities to be quite similar as far as the technological realization
is concerned, the use cases, requirements and challenges are somewhat
different. For example, in the humanities annotating text is a central
use case, which is mostly absent in computer science treatments.
These differences might sooner or later require to develop the
is concerned, the use cases, requirements and challenges are somewhat
different. For example, in the humanities annotating text is a central
use case, which is mostly absent in computer science treatments.
These differences might sooner or later require to develop the
DSL-construction toolkits in a different direction. Also,
DHParser shall (in the future) serve as a teaching tool, which
influences some of its design decisions such as, for example, clearly
......@@ -113,7 +109,7 @@ Further (intended) use cases are:
Mark and Markdown also go beyond what is feasible with pure
EBNF-based-parsers.)
* EBNF itself. DHParser is already self-hosting ;-)
* Digital and cross-media editions
* Digital and cross-media editions
* Digital dictionaries
For a simple self-test run `dhparser.py` from the command line. This
......@@ -122,13 +118,11 @@ Python-based parser class representing that grammar. The concrete and
abstract syntax tree as well as a full and abbreviated log of the
parsing process will be stored in a sub-directory named "LOG".
Introduction
------------
see [Introduction.md](https://gitlab.lrz.de/badw-it/DHParser/blob/master/Introduction.md)
References
----------
......@@ -146,22 +140,19 @@ München 2016. Short-URL: [tiny.badw.de/2JVT][Arnold_2016]
[Arnold_2016]: https://f.hypotheses.org/wp-content/blogs.dir/1856/files/2016/12/EA_Pr%C3%A4sentation_Auszeichnungssprachen.pdf
Brian Ford: Parsing Expression Grammars: A Recognition-Based Syntactic
Foundation, Cambridge
Massachusetts, 2004. Short-URL:[http://t1p.de/jihs][Ford_2004]
[Ford_2004]: https://pdos.csail.mit.edu/~baford/packrat/popl04/peg-popl04.pdf
[Ford_20XX]: http://bford.info/packrat/
[Ford_20XX]: http://bford.info/packrat/
Richard A. Frost, Rahmatullah Hafiz and Paul Callaghan: Parser
Combinators for Ambiguous Left-Recursive Grammars, in: P. Hudak and
D.S. Warren (Eds.): PADL 2008, LNCS 4902, pp. 167–181, Springer-Verlag
Berlin Heidelberg 2008.
Dominikus Herzberg: Objekt-orientierte Parser-Kombinatoren in Python,
Blog-Post, September, 18th 2008 on denkspuren. gedanken, ideen,
anregungen und links rund um informatik-themen, short-URL:
......@@ -169,7 +160,6 @@ anregungen und links rund um informatik-themen, short-URL:
[Herzberg_2008a]: http://denkspuren.blogspot.de/2008/09/objekt-orientierte-parser-kombinatoren.html
Dominikus Herzberg: Eine einfache Grammatik für LaTeX, Blog-Post,
September, 18th 2008 on denkspuren. gedanken, ideen, anregungen und
links rund um informatik-themen, short-URL:
......@@ -177,17 +167,14 @@ links rund um informatik-themen, short-URL:
[Herzberg_2008b]: http://denkspuren.blogspot.de/2008/09/eine-einfache-grammatik-fr-latex.html
Dominikus Herzberg: Uniform Syntax, Blog-Post, February, 27th 2007 on
denkspuren. gedanken, ideen, anregungen und links rund um
informatik-themen, short-URL: [http://t1p.de/s0zk][Herzberg_2007]
[Herzberg_2007]: http://denkspuren.blogspot.de/2007/02/uniform-syntax.html
[ISO_IEC_14977]: http://www.cl.cam.ac.uk/~mgk25/iso-14977.pdf
John MacFarlane, David Greenspan, Vicent Marti, Neil Williams,
Benjamin Dumke-von der Ehe, Jeff Atwood: CommonMark. A strongly
defined, highly compatible specification of
......@@ -195,7 +182,6 @@ Markdown, 2017. [commonmark.org][MacFarlane_et_al_2017]
[MacFarlane_et_al_2017]: http://commonmark.org/
Stefan Müller: DSLs in den digitalen Geisteswissenschaften,
Präsentation auf dem
[dhmuc-Workshop: Digitale Editionen und Auszeichnungssprachen](https://dhmuc.hypotheses.org/workshop-digitale-editionen-und-auszeichnungssprachen),
......@@ -203,15 +189,15 @@ München 2016. Short-URL: [tiny.badw.de/2JVy][Müller_2016]
[Müller_2016]: https://f.hypotheses.org/wp-content/blogs.dir/1856/files/2016/12/Mueller_Anzeichnung_10_Vortrag_M%C3%BCnchen.pdf
Markus Voelter, Sbastian Benz, Christian Dietrich, Birgit Engelmann,
Mats Helander, Lennart Kats, Eelco Visser, Guido Wachsmuth:
Markus Voelter, Sbastian Benz, Christian Dietrich, Birgit Engelmann,
Mats Helander, Lennart Kats, Eelco Visser, Guido Wachsmuth:
DSL Engineering. Designing, Implementing and Using Domain-Specific Languages, 2013.
[http://dslbook.org/][Voelter_2013]
[http://dslbook.org/][Voelter_2013]
[voelter_2013]: http://dslbook.org/
[tex_stackexchange_no_bnf]: http://tex.stackexchange.com/questions/4201/is-there-a-bnf-grammar-of-the-tex-language
[tex_stackexchange_latex_parsers]: http://tex.stackexchange.com/questions/4223/what-parsers-for-latex-mathematics-exist-outside-of-the-tex-engines
[tex_stackexchange_latex_parsers]: http://tex.stackexchange.com/questions/4223/what-parsers-for-latex-mathematics-exist-outside-of-the-tex-engines
[XText_website]: https://www.eclipse.org/Xtext/
......@@ -36,9 +36,8 @@ EBNF_TEMPLATE = r"""-grammar
#
#######################################################################
@ testing = True # testing supresses error messages for unconnected symbols
@ whitespace = vertical # implicit whitespace, includes any number of line feeds
@ literalws = right # literals have implicit whitespace on the right hand side
@ literalws = right # literals have implicit whitespace on the right hand side
@ comment = /#.*/ # comments range from a '#'-character to the end of the line
@ ignorecase = False # literals and regular expressions are case-sensitive
......@@ -49,7 +48,7 @@ EBNF_TEMPLATE = r"""-grammar
#
#######################################################################
document = //~ { WORD } §EOF # root parser: optional whitespace followed by a sequence of words
document = //~ { WORD } §EOF # root parser: a sequence of words preceded by whitespace
# until the end of file
#######################################################################
......@@ -58,25 +57,25 @@ document = //~ { WORD } §EOF # root parser: optional whitespace followed by
#
#######################################################################
WORD = /\w+/~ # a sequence of letters, possibly followed by implicit whitespace
EOF = !/./ # no more characters ahead, end of file reached
WORD = /\w+/~ # a sequence of letters, optional trailing whitespace
EOF = !/./ # no more characters ahead, end of file reached
"""
TEST_WORD_TEMPLATE = r'''[match:WORD]
1 : word
2 : one_word_with_underscores
M1: word
M2: one_word_with_underscores
[fail:WORD]
1 : two words
F1: two words
'''
TEST_DOCUMENT_TEMPLATE = r'''[match:document]
1 : """This is a sequence of words
extending over several lines"""
M1: """This is a sequence of words
extending over several lines"""
[fail:document]
1 : """This test should fail, because neither
comma nor full have been defined anywhere."""
F1: """This test should fail, because neither
comma nor full have been defined anywhere."""
'''
README_TEMPLATE = """# {name}
......@@ -117,14 +116,16 @@ import DHParser.dsl
from DHParser import testing
from DHParser import toolkit
if not DHParser.dsl.recompile_grammar('{name}.ebnf', force=False): # recompiles Grammar only if it has changed
# recompiles Grammar only if it has changed
if not DHParser.dsl.recompile_grammar('{name}.ebnf', force=False):
print('\nErrors while recompiling "{name}.ebnf":\n--------------------------------------\n\n')
with open('{name}_ebnf_ERRORS.txt') as f:
print(f.read())
sys.exit(1)
sys.path.append('./')
# must be appended after module creation, because otherwise an ImportError is raised under Windows
# must be appended after module creation, because
# otherwise an ImportError is raised under Windows
from {name}Compiler import get_grammar, get_transformer
with toolkit.logging(True):
......@@ -135,7 +136,7 @@ if error_report:
print(error_report)
sys.exit(1)
else:
print('\nSUCCESS! All tests passed :-)')
print('ready.')
'''
......@@ -152,7 +153,7 @@ def create_project(path: str):
print('"%s" already exists! Not overwritten.' % name)
if os.path.exists(path) and not os.path.isdir(path):
print('Cannot create new project, because a file named "%s" alread exists!' % path)
print('Cannot create new project, because a file named "%s" already exists!' % path)
sys.exit(1)
name = os.path.basename(path)
print('Creating new DHParser-project "%s".' % name)
......@@ -172,6 +173,7 @@ def create_project(path: str):
create_file(name + '.ebnf', '# ' + name + EBNF_TEMPLATE)
create_file('README.md', README_TEMPLATE.format(name=name))
create_file('tst_%s_grammar.py' % name, GRAMMAR_TEST_TEMPLATE.format(name=name))
os.chmod('tst_%s_grammar.py' % name, 0o755)
os.chdir(curr_dir)
print('ready.')
......@@ -257,5 +259,6 @@ def main():
if not cpu_profile(selftest, 1):
sys.exit(1)
if __name__ == "__main__":
main()
......@@ -66,8 +66,9 @@ class EBNFGrammar(Grammar):
factor = [flowmarker] [retrieveop] symbol !"=" # negative lookahead to be sure it's not a definition
| [flowmarker] literal
| [flowmarker] regexp
| [flowmarker] group
| [flowmarker] oneormore
| [flowmarker] group
| [flowmarker] unordered
| repetition
| option
......@@ -76,6 +77,7 @@ class EBNFGrammar(Grammar):
retrieveop = "::" | ":" # '::' pop, ':' retrieve
group = "(" §expression ")"
unordered = "<" §expression ">" # elements of expression in arbitrary order
oneormore = "{" expression "}+"
repetition = "{" §expression "}"
option = "[" §expression "]"
......@@ -91,7 +93,7 @@ class EBNFGrammar(Grammar):
EOF = !/./
"""
expression = Forward()
source_hash__ = "3c472b3a5d1039680c751fd2dd3f3e24"
source_hash__ = "084a572ffab147ee44ac8f2268793f63"
parser_initialization__ = "upon instantiation"
COMMENT__ = r'#.*(?:\n|$)'
WHITESPACE__ = r'\s*'
......@@ -106,10 +108,11 @@ class EBNFGrammar(Grammar):
option = Series(Token("["), expression, Token("]"), mandatory=1)
repetition = Series(Token("{"), expression, Token("}"), mandatory=1)
oneormore = Series(Token("{"), expression, Token("}+"))
unordered = Series(Token("<"), expression, Token(">"), mandatory=1)
group = Series(Token("("), expression, Token(")"), mandatory=1)
retrieveop = Alternative(Token("::"), Token(":"))
flowmarker = Alternative(Token("!"), Token("&"), Token("-!"), Token("-&"))
factor = Alternative(Series(Option(flowmarker), Option(retrieveop), symbol, NegativeLookahead(Token("="))), Series(Option(flowmarker), literal), Series(Option(flowmarker), regexp), Series(Option(flowmarker), group), Series(Option(flowmarker), oneormore), repetition, option)
factor = Alternative(Series(Option(flowmarker), Option(retrieveop), symbol, NegativeLookahead(Token("="))), Series(Option(flowmarker), literal), Series(Option(flowmarker), regexp), Series(Option(flowmarker), oneormore), Series(Option(flowmarker), group), Series(Option(flowmarker), unordered), repetition, option)
term = OneOrMore(Series(Option(Token("§")), factor))
expression.set(Series(term, ZeroOrMore(Series(Token("|"), term))))
directive = Series(Token("@"), symbol, Token("="), Alternative(regexp, literal, list_), mandatory=1)
......
# EBNF-Grammar in EBNF
@ comment = /#.*(?:\n|$)/ # comments start with '#' and eat all chars up to and including '\n'
@ whitespace = /\s*/ # whitespace includes linefeed
@ literalws = right # trailing whitespace of literals will be ignored tacitly
syntax = [~//] { definition | directive } §EOF
definition = symbol §"=" §expression
directive = "@" §symbol §"=" §( regexp | literal | list_ )
expression = term { "|" term }
term = { factor }+
factor = [flowmarker] [retrieveop] symbol !"=" # negative lookahead to be sure it's not a definition
| [flowmarker] literal
| [flowmarker] regexp
| [flowmarker] group
| [flowmarker] oneormore
| repetition
| option
flowmarker = "!" | "&" | "§" # '!' negative lookahead, '&' positive lookahead, '§' required
| "-!" | "-&" # '-' negative lookbehind, '-&' positive lookbehind
retrieveop = "::" | ":" # '::' pop, ':' retrieve
group = "(" expression §")"
oneormore = "{" expression "}+"
repetition = "{" expression §"}"
option = "[" expression §"]"
symbol = /(?!\d)\w+/~ # e.g. expression, factor, parameter_list
literal = /"(?:[^"]|\\")*?"/~ # e.g. "(", '+', 'while'
| /'(?:[^']|\\')*?'/~ # whitespace following literals will be ignored tacitly.
regexp = /~?\/(?:\\\/|[^\/])*?\/~?/~ # e.g. /\w+/, ~/#.*(?:\n|$)/~
# '~' is a whitespace-marker, if present leading or trailing
# whitespace of a regular expression will be ignored tacitly.
list_ = /\w+/~ { "," /\w+/~ } # comma separated list of symbols, e.g. BEGIN_LIST, END_LIST,
# BEGIN_QUOTE, END_QUOTE ; see CommonMark/markdown.py for an exmaple
EOF = !/./
This diff is collapsed.
line: 8, column: 27, Warning: One mandatory marker (§) sufficient to declare the rest of the series as mandatory.
line: 9, column: 27, Warning: One mandatory marker (§) sufficient to declare the rest of the series as mandatory.
line: 9, column: 32, Warning: One mandatory marker (§) sufficient to declare the rest of the series as mandatory.
File mode changed from 100644 to 100755
......@@ -98,8 +98,8 @@ generic_inline_env = begin_inline_env //~ paragraph §end_inline_env
begin_inline_env = (-!LB begin_environment) | (begin_environment !LFF)
end_inline_env = end_environment
## (-!LB end_environment) | (end_environment !LFF) # ambiguity with genric_block when EOF
begin_environment = /\\begin{/ §NAME §/}/
end_environment = /\\end{/ §::NAME §/}/
begin_environment = /\\begin{/ §NAME /}/
end_environment = /\\end{/ §::NAME /}/
inline_math = /\$/ /[^$]*/ §/\$/
......
......@@ -148,8 +148,8 @@ class LaTeXGrammar(Grammar):
begin_inline_env = (-!LB begin_environment) | (begin_environment !LFF)
end_inline_env = end_environment
## (-!LB end_environment) | (end_environment !LFF) # ambiguity with genric_block when EOF
begin_environment = /\\begin{/ §NAME §/}/
end_environment = /\\end{/ §::NAME §/}/
begin_environment = /\\begin{/ §NAME /}/
end_environment = /\\end{/ §::NAME /}/
inline_math = /\$/ /[^$]*/ §/\$/
......@@ -230,7 +230,7 @@ class LaTeXGrammar(Grammar):
paragraph = Forward()
tabular_config = Forward()
text_element = Forward()
source_hash__ = "a4c1da340e03a51e46030c64f671f1d6"
source_hash__ = "1ded00ed838b03fcffcc6cd4333d4ae0"
parser_initialization__ = "upon instantiation"
COMMENT__ = r'%.*'
WHITESPACE__ = r'[ \t]*(?:\n(?![ \t]*\n)[ \t]*)?'
......
......@@ -54,7 +54,7 @@ This is a subsction about quotations. And here is the quote:
Gerade hierauf beruht jene Glückseligkeit des ersten Viertels unseres Lebens,
in Folge welcher es nachher wie ein verolorenes Paradis hinter uns liegt.
Wir haben in der Kindheit nur wenige Beziehungen und geringe Bedürfnisse, also wenig
Anreung des Willens: der größere Theil unseres Wesens geht demnach im Erkennen auf.
Anregung des Willens: der größere Theil unseres Wesens geht demnach im Erkennen auf.
\cite[199]{Schopenhauer1851}
\end{quote}
......
......@@ -55,7 +55,7 @@ def tst_func():
files = os.listdir('testdata')
files.sort()
for file in files:
if fnmatch.fnmatch(file, '*1.tex') and file.lower().find('error') < 0:
if fnmatch.fnmatch(file, '*2.tex') and file.lower().find('error') < 0:
with open(os.path.join('testdata', file), 'r', encoding='utf-8') as f:
doc = f.read()
print('\n\nParsing document: "%s"\n' % file)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment