*** Welcome to piglix ***

Overlapping markup


In markup languages and the digital humanities, overlap occurs when a document has two or more structures that interact in a non-hierarchical manner. A document with overlapping markup cannot be represented as a tree. This is also known as concurrent markup. Overlap happens, for instance, in poetry, where there may be a metrical structure of feet and lines; a linguistic structure of sentences and quotations; and a physical structure of volumes and pages and editorial annotations.

The problem of non-hierarchical structures in documents has been recognised since 1988; resolving this problem against the dominant paradigm of text as a single hierarchy (an ordered hierarchy of content objects or OHCO) was initially thought to be merely a technical issue, but has, in fact, proven much more difficult. In 2008, Jeni Tennison identified markup overlap as "the main remaining problem area for markup technologists".

A distinction exists between schemes that allow non-contiguous overlap, and those that allow only contiguous overlap. Often, 'markup overlap' strictly means the latter. Contiguous overlap can always be represented as a linear document with milestones, without the need for fragmentation and pointers to fragments, but non-contiguous overlap may require document fragmentation. Another distinction in overlapping markup schemes is whether elements can overlap with other elements of the same kind (self-overlap).

A scheme may have a privileged hierarchy. Some XML-based schemes, for example, represent one hierarchy directly in the XML document tree, and represent other, overlapping, structures by another means; these are said to be non-privileged.

DeRose (2004, Evaluation criteria) identifies several criteria for judging solutions to the overlap problem: readability and maintainability, tool support and compatibility with XML, possible validation schemes, and ease of processing.

Tag soup is, strictly speaking, not overlapping markup—it is malformed HTML, which is a non-overlapping language, and may be ill-defined. HTML5 defines how processors should deal with such mis-nested markup in the HTML syntax and turn it into a single hierarchy. With XHTML and SGML-based HTML, however, mis-nested markup is a strict error and makes processing by standards-compliant systems impossible.


...
Wikipedia

...