XML documents have a hierarchical structure and can conceptually be interpreted as a tree structure, called an XML tree.
XML documents must contain a root element (one that is the parent of all other elements). All elements in an XML document can contain sub elements, text and attributes. The tree represented by an XML document starts at the root element and branches to the lowest level of elements. Although there is no consensus on the terminology used on XML Trees, at least two standard terminologies have been released by the W3C:
XPath defines a syntax named XPath expressions that identifies one or more internal components (elements, attributes, etc.) of an XML document. XPath is widely used to accesses XML-encoded data.
The XML Information Set, or XML infoset, describes an abstract data model for XML documents in terms of information items. It is often used in the specifications of XML languages, for its convenience in describing constraints on constructs those languages allow.
In mathematics, a tree is an undirected graph in which any two vertices are connected by exactly one simple path. Any connected graph without simple cycles is a tree. A tree data structure simulates a hierarchical tree structure with a set of linked nodes. A hierarchy consists of a preorder defined on a set. The term hierarchy is used to stress a hierarchical relation among the elements.
The XML specification defines an XML document as a well-formed text if it satisfies a list of syntax rules defined in the specification. This specification is long, however 2 key points relating to the tree structure of an XML document are:
These features resemble those of trees, in that there is a single root node, and an order to the elements. XML has appeared as a first-class data type in other languages. The JavaScript (E4X) extension explicitly defines two specific objects (XML and XMLList), which support XML document nodes and XML node lists as distinct objects and use a dot-notation specifying parent-child relationships. These data structures represent XML documents as a tree structure.
XPath, the XML Path Language, is a query language for selecting nodes from an XML document. XPath defines a syntax named XPath expressions that can query an XML document for one or more internal components (elements, attributes, etc.). XPath is widely used in other core-XML specifications and in programming libraries for accessing XML-encoded data.
The XPath Data Model is a long specification, and goes into many features unrelated to XML trees. Listed below are key excerpts relating to XML tree terminology: "
A document order is defined among all the nodes accessible during a given query or transformation. Document order is a total ordering. Informally, document order is the order in which nodes appear in the XML serialization of a document. Within a tree, document order satisfies the following constraints: