Bring back Empty End-Tags in XML?

Some notes I wrote in January 2008. Put into blog for storage. Of course, this idea is not practical.

Introduction
Author’s note. I came up with this while staring at an XML document, a Spring context file to be exact. I was calling this contextual or anonymous end-tags. Of course, later I learned that this is not new a new idea. XML’s predecessor, SGML, used “empty end tag” and many other minimization techniques. The empty end tag minimization syntax of SGML was not adopted by XML. Are the reasons against empty end-tag minimization still valid today?

Historical Note
When XML was being designed there were some goals set out.

10. Terseness is of minimal importance.
Minimizing keystrokes is not deemed important in achieving any of the above goals, but other things being equal a concise notation should be preferred to a verbose. — Draft DD-1996-0001 – Design Principles for XML

The historical reason for this goal is that the complexity and difficulty of SGML was greatly increased by its use of minimization, i.e. the omission of pieces of markup, in the interest of terseness. In the case of XML, whenever there was a conflict between conciseness and clarity, clarity won. — Tim Bray, ‘Terseness is of Minimal Importance’, an annotation in Annotated XML

Background
In XML an end tag is defined as: The end of every element that begins with a start-tag must be marked by an end-tag containing a name that echoes the element’s type as given in the start-tag:

End-tag

[42] ETag ::= ‘</’ Name S? ‘>

There is also a Well-Formedness Constraint:
Element Type Match
The Name in an element’s end-tag must match the element type in the start-tag.

An example of an end-tag:

Proposal
Proposed change to the ETag:

ETag ::= ‘</’ Name? S? ‘>’

Well-Formedness Constraint:
Element Type Match
The Name in an element’s end-tag, if present, must match the element type in the start-tag.

In other words, the tag name is optional. This works since XML must be well-formed and thus every empty end-tag will nest properly with its associated start-tag. And, it is also similar to the shortcut for elements that contain no content, for example, Supporting both types of end-tags helps toward compatibility with older applications of XML. Many other XML alternative proposals, of course, are more radical, but this proposal is meant to continue the XML syntax design.

Usage examples

With no content With content
<img src=’madonna.gif’></img> <img src=’madonna.gif’>some content</img>
<img src=’madonna.gif’/>

<img src=’madonna.gif’>some content</>

<img src=’madonna.gif’></>

It is suggested that non-empty end-tags still be used where there is a
need for human readable markup such as in small display oriented ‘pages’,
XHTML.

Advantages
Todo: Derive a real mathematical expression for size savings. This is something like, Size = Total – S n(i)e(i), then savings = size * cost/unit

Disadvantages
Undoubtedly there were good reasons for the dropping of empty end-tags, such as: parser complexity and human readability.

Appendix A. XML Comments
Since we’re in the topic of changes that would break compatibility with SGML, another important change would be the XML Comments. The XML production is:

Comment ::= '<!--'
((Char - '-')
| ('-' (Char - '-')))*
'-->'

For compatibility the string “–” must not occur within the comments.

I propose that instead the production be:

Comment ::= '<!--'((Char - '-') | ('-' (Char - '-')))* '--!>'

And, the string “–” be allowed anywhere in the comments.

Leave a Reply

Your email address will not be published. Required fields are marked *