An XML database is a data persistence software system that allows data to be specified, and sometimes stored, in XML format. This data can be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document-oriented databases which are in turn a category of NoSQL database.
There are a number of reasons to directly specify data in XML or other document formats such as JSON. For XML in particular, they include:
Steve O'Connell gives one reason for the use of XML in databases: the increasingly common use of XML for data transport, which has meant that "data is extracted from databases and put into XML documents and vice-versa". It may prove more efficient (in terms of conversion costs) and easier to store the data in XML format. In content-based applications, the ability of the native XML database also minimizes the need for extraction or entry of metadata to support searching and navigation.
XML enabled databases typically offer one or more of the following approaches to storing XML within the traditional relational structure:
RDBMS that support the ISO XML Type are:
Typically an XML enabled database is best suited where the majority of data are non-XML. For datasets where the majority of data are XML, a native XML database is better suited.
These databases are typically better when much of the data is in XML or other non-relational formats.
All the above databases uses XML as an interface to specify documents as tree structured data that may contain unstructured text, but on disk the data is stored as "optimized binary files." This makes query and retrieval faster. For MarkLogic it also allows XML and JSON to co-exist in one binary format.
Key features of native XML databases include:
The standards for XML querying per W3C recommendation are XQuery 1.0 and XQuery 3.0. XQuery includes XPath as a sub-language and XML itself is a valid sub-syntax of XQuery.
In addition to XPath, XML databases support XSLT as a method of transforming documents or query-results retrieved from the database. XSLT provides a declarative language written using an XML grammar. It aims to define a set of XPath filters that can transform documents (in part or in whole) into other formats including plain text, XML, or HTML.