XML Tips

Writing and Validating XML

You can use your favorite editor to edit XML documents. The file name should have suffix .xml. For example, emacs and vim both have descent XML modes, which are automatically invoked for file names with .xml suffix. There are also many specialized XML editors available, and some of them come with nice templates and automatic validation. Just search for “XML editor” in Google.

Some XML and DTD examples can be found in /opt/dbcourse/examples/xmark/ on your VM.

To check that your XML document is well-formed (i.e., no synatx errors), use the command
xmllint --noout file.xml
in your VM, where file.xml is the name of the XML document. If the document is well-formed, no output is generated; otherwise, any error will be reported.

If your XML document has a DOCTYPE declaration and you want to validate it against the DTD specified, use the command
xmllint --valid --noout file.xml
The additional --valid flag turns on DTD validation.

If your XML document does not have a DOCTYPE declaration but you still want to validate it against a DTD, use the command
xmllint --dtdvalid spec.dtd --noout file.xml
where spec.dtd is the name of the DTD file.

Most of the Web browser these days also support visualization and validation of XML documents by DTD. Simply open the XML file from a browser like Chrome; it will automatically validate the XML document and display it in a nice format.

To check your XML document against an XSD (XML Schema) file, use the command
xmllint --schema schema.xsd --noout file.xml
where schema.xsd is the name of the XML Schema document and file.xml is the name of the XML document.

XPath, XQuery, and XSLT with Saxon

Saxon is an open-source XPath/XQuery/XSLT engine. To run an XQuery, use your favorite text editor to create a text file, say query.xq, containing the query string, and then issue in your VM shell the command
saxonb-xquery query.xq
The result will be printed to standard output. As an example, you may try a file containing the following XQuery string:

<result>
  {
    for $i in doc("/opt/dbcourse/examples/xmark/auction.xml")//item
    where $i/payment = "Creditcard"
    return $i/name
  }
</result>

Alternatively, you can specify the XML file on the command line with the command
saxonb-xquery -s /opt/dbcourse/examples/xmark/auction.xml query.xq
Doing so allows the following XQuery (note that doc(...) is no longer needed):

<result>
  {
    for $i in //item
    where $i/payment = "Creditcard"
    return $i/name
  }
</result>

To run an XSLT stylesheet, use the command
saxonb-xslt input.xml transform.xslt
where input.xml specifies the name of the input XML file, and transform.xslt specifies the name of the XSLT stylesheet file. The result will be output to standard output.

See Saxon documentation for additional help. In particular, you might find the list of supported functions useful.

Working with XML in Python

Under the directory /opt/dbcourse/examples/xml-python on your VM, you will find two example programs, saxpath.py and dompath.py, which use the SAX API (xml.sax) and DOM API (xml.dom), respectively.