Commit 1387a0b2 authored by Eckhart Arnold's avatar Eckhart Arnold
Browse files

Merge remote-tracking branch 'origin/development' into development

parents c493a67b 40728a80
......@@ -58,13 +58,13 @@ a few drawbacks to this approach:
and closing tags. This takes time...
- While looking for a good XML-Editor, you find that there hardly exist
any XML-Editors any more. (And for a reason, actually...) In
particular, there are no good open source XML-Editors.
any XML-Editors, any more. (And for a reason, actually...) In
particular, there are not many good open source XML-Editors.
On the other hand, there are good reasons why XML is used in the
humanities: Important encoding standards like
[TEI-XML](http://www.tei-c.org/index.xml) are defined in XML. Its strict
syntax and the possibility to check data against schema help to detect
syntax and the possibility to check data against a schema help to detect
and avoiding encoding errors. If the schema is well-defined, it is
unambiguous, and it is easy to parse for a computer. Most of these
advantages, however, are on a technical level and few of them are
......@@ -335,7 +335,7 @@ strictly separated steps:
2. Transformation of the CST into an "abstract syntax tree" (AST)
3. And, finally, compilation of the AST into valid XML, HTML, LaTeX or
what you like.
whatever you like.
DHParser automatically only generates a parser for the very first step.
The other steps have to be programmed by hand, though DHParser tries to
......
......@@ -10,9 +10,9 @@ Email: arnold@badw.de
License
-------
DHParser is open source software under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0)
DHParser is open source software under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
Copyright 2016-2017 Eckhart Arnold, Bavarian Academy of Sciences and Humanities
Copyright 2016-2018 Eckhart Arnold, Bavarian Academy of Sciences and Humanities
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
......
......@@ -27,7 +27,7 @@ Setting up a new DHParser project
=================================
Since DHParser, while quite mature in terms of implemented features, is still
in a pre-first-release state, it is for the time being more recommendable to
in a pre-first-release state, it is, for the time being, more recommendable to
clone the most current version of DHParser from the git-repository rather than
installing the packages from the Python Package Index (PyPI).
......@@ -67,8 +67,8 @@ In order to verify that the installation works, you can simply run the
"dhparser.py" script and, when asked, chose "3" for the self-test. If the
self-test runs through without error, the installation has succeeded.
Staring a new DHParser project
------------------------------
Starting a new DHParser project
-------------------------------
In order to setup a new DHParser project, you run the ``dhparser.py``-script
with the name of the new project. For the sake of the example, let's type::
......@@ -406,7 +406,7 @@ stop. (Understandable? If you have ever read Russell's "Introduction to
Mathematical Philosophy" you will be used to this kind of prose. Other than
that I find the formal definition easier to understand. However, for learning
EBNF or any other formalism, it helps in the beginning to translate the
meaning of its statements into plain old English.)
meaning of its statements into plain language.)
There are two subtle mistakes in this grammar. If you can figure them out
just by thinking about it, feel free to correct the grammar right now. (Would
......@@ -819,5 +819,5 @@ few anonymous <:Token> nodes. Here is a little exercise: Do away with those
<:Token>-nodes by replacing them by something semantically more meaningful.
Hint: Add a new symbol "delimiter" in the grammar definition "poetry.ebnf". An
alternative strategy to extending the grammar would be to use the
``replace_parser`` operator. Which of the strategy is the better one? Explain
``replace_parser`` operator. Which of the strategies is the better one? Explain
why.
......@@ -9,7 +9,7 @@ Author: Eckhart Arnold <eckhart.arnold@posteo.de>
XML is open source software under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0)
Copyright YEAR AUTHOR'S NAME <EMAIL>, AFFILIATION
Copyright 2018 Eckhart Arnold <arnold@badw.de>, Bavarian Academy of Sciences and Humanities
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
......
......@@ -6,7 +6,7 @@
#
#######################################################################
@ whitespace = /\s*/ # implicit whitespace, signified by ~
@ whitespace = /\s*/ # insignificant whitespace, signified by ~
@ literalws = none # literals have no implicit whitespace
@ comment = // # no implicit comments
@ ignorecase = False # literals and regular expressions are case-sensitive
......
# XMLSnippet-grammar
#######################################################################
#
# EBNF-Directives
#
#######################################################################
@ whitespace = vertical # implicit whitespace, includes any number of line feeds
@ literalws = right # literals have implicit whitespace on the right hand side
@ comment = /#.*/ # comments range from a '#'-character to the end of the line
@ whitespace = /\s*/ # insignificant whitespace, signified by ~
@ literalws = none # literals have no implicit whitespace
@ comment = // # no implicit comments
@ ignorecase = False # literals and regular expressions are case-sensitive
#######################################################################
#
# Structure and Components
# Document Frame and Prolog
#
#######################################################################
document = prolog element [Misc] EOF
prolog = [ ~ XMLDecl ] [Misc] [doctypedecl [Misc]]
XMLDecl = '<?xml' VersionInfo [EncodingDecl] [SDDecl] ~ '?>'
VersionInfo = ~ 'version' ~ '=' ~ ("'" VersionNum "'" | '"' VersionNum '"')
VersionNum = /[0-9]+\.[0-9]+/
EncodingDecl = ~ 'encoding' ~ '=' ~ ("'" EncName "'" | '"' EncName '"')
EncName = /[A-Za-z][A-Za-z0-9._\-]*/
SDDecl = ~ 'standalone' ~ '=' ~ (("'" Yes | No "'") | ('"' Yes | No '"'))
Yes = 'yes'
No = 'no'
#######################################################################
#
# Logical Structures
#
#######################################################################
element = emptyElement | STag §content ETag
STag = '<' TagName { ~ Attribute } ~ '>'
ETag = '</' §::TagName ~ '>'
document = prolog element EOF
prolog = ""
xml = { element | text | comment }
element = single_tag | tag_pair
single_tag = "<" name attributes "/>"
tag_pair = opening_tag xml closing_tag
opening_tag = "<" tag_name attributes ">"
closing_tag = "</" ::tag_name ">"
attributes = { attribute }
attribute = name "=" '"' content '"'
emptyElement = '<' Name { ~ Attribute } ~ '/>'
TagName = Name
Attribute = Name ~ §'=' ~ AttValue
content = [ CharData ]
{ (element | Reference | CDSect | PI | Comment)
[CharData] }
name = IDENTIFIER
tag_name = IDENTIFIER
#######################################################################
#
# Regular Expressions
# Literals
#
#######################################################################
WORD = /\w+/~ # a sequence of letters, optional trailing whitespace
EOF = !/./ # no more characters ahead, end of file reached
EntityValue = '"' { /[^%&"]+/ | PEReference | Reference } '"'
| "'" { /[^%&']+/ | PEReference | Reference } "'"
AttValue = '"' { /[^<&"]+/ | Reference } '"'
| "'" { /[^<&']+/ | Reference } "'"
SystemLiteral = '"' /[^"]*/ '"' | "'" /[^']*/ "'"
PubidLiteral = '"' [PubidChars] '"'
| "'" [PubidCharsSingleQuoted] "'"
#######################################################################
#
# References
#
#######################################################################
Reference = EntityRef | CharRef
EntityRef = '&' Name ';'
PEReference = '%' Name ';'
#######################################################################
#
# Names and Tokens
#
#######################################################################
Nmtokens = Nmtoken { / / Nmtoken }
Nmtoken = NameChars
Names = Name { / / Name }
Name = NameStartChar [NameChars]
NameStartChar = /_|:|[A-Z]|[a-z]
|[\u00C0-\u00D6]|[\u00D8-\u00F6]|[\u00F8-\u02FF]
|[\u0370-\u037D]|[\u037F-\u1FFF]|[\u200C-\u200D]
|[\u2070-\u218F]|[\u2C00-\u2FEF]|[\u3001-\uD7FF]
|[\uF900-\uFDCF]|[\uFDF0-\uFFFD]
|[\U00010000-\U000EFFFF]/
NameChars = /(?:_|:|-|\.|[A-Z]|[a-z]|[0-9]
|\u00B7|[\u0300-\u036F]|[\u203F-\u2040]
|[\u00C0-\u00D6]|[\u00D8-\u00F6]|[\u00F8-\u02FF]
|[\u0370-\u037D]|[\u037F-\u1FFF]|[\u200C-\u200D]
|[\u2070-\u218F]|[\u2C00-\u2FEF]|[\u3001-\uD7FF]
|[\uF900-\uFDCF]|[\uFDF0-\uFFFD]
|[\U00010000-\U000EFFFF])+/
#######################################################################
#
# Comments, Processing Instructions and CDATA sections
#
#######################################################################
Misc = { Comment | PI | S }+
Comment = '<!--' { CommentChars | /-(?!-)/ } '-->'
PI = '<?' PITarget [~ PIChars] '?>'
PITarget = !/X|xM|mL|l/ Name
CDSect = '<![CDATA[' CData ']]>'
#######################################################################
#
# Characters, Explicit Whitespace and End of File
#
#######################################################################
PubidCharsSingleQuoted = /(?:\x20|\x0D|\x0A|[a-zA-Z0-9]|[-()+,.\/:=?;!*#@$_%])+/
PubidChars = /(?:\x20|\x0D|\x0A|[a-zA-Z0-9]|[-'()+,.\/:=?;!*#@$_%])+/
CharData = /(?:(?!\]\]>)[^<&])+/
CData = /(?:(?!\]\]>)(?:\x09|\x0A|\x0D|[\u0020-\uD7FF]|[\uE000-\uFFFD]|[\U00010000-\U0010FFFF]))+/
IgnoreChars = /(?:(?!(?:<!\[)|(?:\]\]>))(?:\x09|\x0A|\x0D|[\u0020-\uD7FF]|[\uE000-\uFFFD]|[\U00010000-\U0010FFFF]))+/
PIChars = /(?:(?!\?>)(?:\x09|\x0A|\x0D|[\u0020-\uD7FF]|[\uE000-\uFFFD]|[\U00010000-\U0010FFFF]))+/
CommentChars = /(?:(?!-)(?:\x09|\x0A|\x0D|[\u0020-\uD7FF]|[\uE000-\uFFFD]|[\U00010000-\U0010FFFF]))+/
CharRef = ('&#' /[0-9]+/ ';') | ('&#x' /[0-9a-fA-F]+/ ';')
Chars = /(?:\x09|\x0A|\x0D|[\u0020-\uD7FF]|[\uE000-\uFFFD]|[\U00010000-\U0010FFFF])+/
Char = /\x09|\x0A|\x0D|[\u0020-\uD7FF]|[\uE000-\uFFFD]|[\U00010000-\U0010FFFF]/
S = /\s+/ # whitespace
EOF = !/./ # no more characters ahead, end of file reached
Life is but a walking shadow
<?xml version="1.0" encoding="UTF-8"?>
<note date="2018-06-14">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body> Don't forget me this weekend! </body>
<priority level="high" />
<remark></remark>
</note>
\ No newline at end of file
......@@ -32,7 +32,7 @@ if __name__ == "__main__":
# print("Running nosetests:")
# os.system("nosetests test")
if platform.system() != "Windows":
interpreters = ['python ', 'pypy3 ', 'python37 ']
interpreters = ['python3 ', 'pypy3 ']
else:
interpreters = ['python.exe ']
......@@ -41,7 +41,7 @@ if __name__ == "__main__":
timestamp = time.time()
with concurrent.futures.ProcessPoolExecutor(4) as pool:
with concurrent.futures.ThreadPoolExecutor(4) as pool:
for interpreter in interpreters:
os.system(interpreter + '--version')
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment