Acronyms!
- API
-
Application Programming Interface
- CSS
-
Cascading Style Sheets
- DOM
-
Document Object Model
- DTD
-
Document Type Definition
- XML
-
Extensible Markup Language
- HTML
-
Hypertext Markup Language
- SAX
-
Simple API for XML
- SOAP
-
Simple Object Access Protocol
- W3C
-
World Wide Web Consortium
- XLink
-
XML Linking Language
- XPointer
-
XML Pointer Language
- XSD
-
XMLSchema Definition
- XSL
-
Extensible Stylesheet Language
- XSLT
-
Extensible Stylesheet Language Transformation
DTD - Document Type Definition
<!ENTITY % addrElements "street | city | state | zip"> <!ELEMENT address (#PCDATA | %addrElements; )*> <!ELEMENT location (city?, state?, zip?)> <!ELEMENT street (#PCDATA)> <!ATTLIST street id ID #IMPLIED> <!ELEMENT city (#PCDATA)> <!ATTLIST city id ID #IMPLIED> <!ELEMENT state (#PCDATA)> <!ATTLIST state id ID #IMPLIED> <!ELEMENT zip (#PCDATA)> <!ATTLIST zip id ID #IMPLIED>
-
*
-
Zero or more of these elements.
-
?
-
One or more of these elements.
-
#PCDATA
-
Parsed Character Data.
-
#IMPLIED
-
Means the tag is optional. Alternative to
#REQUIRED
.
For a good example of a DTD
just look at
the definition of XMLSchema (which
isn’t necessarily defined using XMLSchema).
Escaping What XML Is Sensitive To
This can be put in a Bash script just like this because the new lines are ok as whitespace here.
sed 's/\&/\&/g;
s/"/\"/g;
s/</\</g;
s/>/\>/g;
s/\x27/\'/g'
Validating XML
Validating first implies that the XML isn’t messed up in some fundamental way like mismatched tags or bad nesting. But it also can be checked against the rules for how your particular flavor of XML should be organized as specified in the DTD. Here’s how I’ve been checking it:
xmllint --valid --noout --dtdvalid hardware.dtd resources.xml
If you just need to validate RSS, check out the official RSS validator.
XML Schema
My XML books are from the early 2000’s and I’m not sure what the state of XML is now. It seems that people didn’t like DTD because it was not in XML itself. So a different plan was formed to fix that. XML Schema is a way to define XML schemas using XML itself.
Looks like xmllint can handle this too:
xmllint --valid --noout --schema hardware.xsd resources.xml
I found this to be quite fussy and the first thing to check is to see if the simplest thing possible will validate. Here’s a simple test set up.
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="name" type="xs:string"/> </xs:schema>
<?xml version="1.0" encoding="UTF-8" ?> <name>x</name>
And then check it:
xmllint --noout --schema simplest.xsd simplest.xml
It should say:
simplest.xml validates
A slightly more complicated example:
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="dictionary" type="DictType"/> <xs:complexType name="DictType"> <xs:sequence> <xs:element name="entry" type="EntryType" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="EntryType"> <xs:sequence> <xs:element name="word" type="xs:string" maxOccurs="1" /> <xs:element name="def" type="xs:string" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> </xs:schema>
<?xml version="1.0" encoding="UTF-8" ?> <dictionary> <entry> <word>quill</word> <def>The large strong feather of a goose or other large fowl.</def> <def>Formerly a common instrument of writing.</def> <def>The spine of a porcupine.</def> </entry> <entry> <word>attribute</word> <def>That which is attributed.</def> <def>A quality which is considered as belonging to, or inherent in, a person or thing.</def> <def>An essential or necessary property or characteristic.</def> </entry> </dictionary>
This shows a type definition used to create a container supporting multiple child tags.
Checking Schema Documents
Previously I demonstrated how to check your XML document against a schema. Since "XML Schema" files are XML it should be possible to validate this too. The question is, against what? What are the official rules for defining an XML Schema file? Here is how it is done including that defining rule definition file:
xmllint --noout --dtdvalid http://www.w3.org/2001/XMLSchema.dtd myown.xsd
If that’s not weird enough for you, you can also validate schema definition files using an "XML Schema schema document":
wget http://www.w3.org/2001/XMLSchema.xsd
xmllint --noout --schema XMLSchema.xsd myown.xsd
This seems to take a lot longer and I wasn’t able to get xmllint
to
pull it in off the Internet.
Of course if you have a schema designed to specify how schemas are composed you should be able to see if that schema itself follows its own rules by checking it against itself.
xmllint --noout --schema XMLSchema.xsd XMLSchema.xsd
If this doesn’t work, you should ask for your money back.
XSL - Extensible Stylesheet Language
Note that XSLT is Extensible Stylesheet Language for Transformation. This is a subset of the more general XSL. The input is data in an XML structure. The output is XML which has been transformed, presumably to better accommodate some output format or purpose.
Building on the "simplest" examples above, here is the simplest stylesheet for this kind of document:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="name"> <html> <body> <xsl:apply-templates/> </body> </html> </xsl:template> </xsl:stylesheet>
When this is applied with:
xsltproc simplest.xsl simplest.xml > simplest.html
Note
|
Seems like only weird people like me use command lines. With XSL, it seems that browsers can do this natively. Check out: https://www.w3schools.com/xml/xsl_client.asp |
The resulting HTML file looks like this:
<html><body>x</body></html>
Here is a more complicated example using the dictionary example from above.
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <!-- No idea why blank lines were popping out. This fixed it. --> <xsl:strip-space elements="dictionary"/> <xsl:template match="dictionary"> <html><body> <h3>Dictionary</h3> <xsl:apply-templates> <xsl:sort select="word"/> </xsl:apply-templates> </body></html> </xsl:template> <xsl:template match="entry"> <p> <b><xsl:value-of select="word"/></b>: <xsl:for-each select="def"> <xsl:number value="position()"/> <xsl:text>. </xsl:text> <xsl:value-of select="."/> <xsl:text> 
</xsl:text> </xsl:for-each> </p> </xsl:template> </xsl:stylesheet>
Note that this line just produces a simple hard return in the output.
<xsl:text> 
</xsl:text>
This is very helpful when degunking a big XML database into a sensible text file (and not necessarily making HTML or other such messes).
When applied to the dictionary.xml
above this style sheet produces:
<html><body> <h3>Dictionary</h3> <p><b>attribute</b>: 1. That which is attributed. 2. A quality which is considered as belonging to, or inherent in, a person or thing. 3. An essential or necessary property or characteristic. </p> <p><b>quill</b>: 1. The large strong feather of a goose or other large fowl. 2. Formerly a common instrument of writing. 3. The spine of a porcupine. </p> </body></html>
This looks like this in a browser:
Dictionary
attribute: 1. That which is attributed. 2. A quality which is considered as belonging to, or inherent in, a person or thing. 3. An essential or necessary property or characteristic.
quill: 1. The large strong feather of a goose or other large fowl. 2. Formerly a common instrument of writing. 3. The spine of a porcupine.
Controlling From The Command Line
Let’s say that you want to run your style sheet against your data XMLs in slightly different ways — not different enough to deserve completely different style sheets, just some minor adjustments. You can pass in some variable settings at the command line and then adjust to those with the XSL logic. Here’s a complete example.
:->(4.6ms)[x:~/t/site/xsltest]$ cat awards.xml
<?xml version="1.0" encoding="UTF-8"?>
<resume>
<achievement priority="NONE">
<event>Finisher Boston Potato Race</event>
<description>The priority is unspecified.</description>
</achievement>
<achievement priority="3">
<event>Bronze Medal Race For The Disease</event>
<description>The priority is set to 3.</description>
</achievement>
<achievement priority="1">
<event>Gold Medal Olympic Armadillo Sled</event>
<description>The priority is set to 1.</description>
</achievement>
<achievement priority="2">
<event>Silver Medal French Taunting</event>
<description>The priority is set to 2.</description>
</achievement>
<achievement priority="NONE">
<event>Last Place Finisher Race To The Bottom</event>
<description>The priority is unspecified.</description>
</achievement>
</resume>
:->(4.4ms)[x:~/t/site/xsltest]$ cat awards.xsl
<xsl:stylesheet version="1.1"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml">
<xsl:template match="/">
Palmares
========
<xsl:for-each select="resume/achievement">
<xsl:if test="@priority<=$verbosity">
<xsl:value-of select="event"/>
<xsl:value-of select="description"/>
<xsl:text> 
</xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
:->(4.4ms)[x:~/t/site/xsltest]$ xsltproc --stringparam verbosity 5 awards.xsl awards.xml
<?xml version="1.0"?>
Palmares
========
Bronze Medal Race For The DiseaseThe priority is set to 3.
Gold Medal Olympic Armadillo SledThe priority is set to 1.
Silver Medal French TauntingThe priority is set to 2.
Note the --stringparam
setting passing in the value. Unfortunately
if the tag does not have the attribute at all (in this case I used
priority=
) then it won’t show up and I do not know how to get a
complex expression to include the untagged. Still, this is a good
start for simple things.