XML

 


COIN78 - XML Lesson 8: XPath and XSL eXtensible Stylesheet Language


XPath

XPath is a non-XML, query language used to select particular parts of XML documents. XPath lets you write expressions that refer to the nodes (elements) and attributes within a node and all xml-stylesheet processing instructions.

XPath is based on a tree representation of the XML document and provides the ability to navigate the nodes by position, relative position, type, content, and several other criteria. This model is used by XSLT, XQuery, XPointer, and XLink.

XPath Graph

Image from w3schools.com

XPath Expressions

XPath expresssions contain ways to navigate, locate paths, and find values, numbers, booleans, or strings, in the XML document. First, we will describe the expressions that may be used to navigate the document. XSLT uses this model to match patterns and select expressions. We will use the CD Catalog for our example.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="cdcatalog_filter.xsl"?>

<catalog>

    <cd>
        <title lang="eng">Empire Burlesque</title>
        <artist>Bob Dylan</artist>
        <price>10.90</price>
    </cd>

    <cd>
        <title lang="eng">Hide your heart</title>
        <artist>Bonnie Tyler</artist>
        <price>9.90</price>
    </cd>

</catalog>

Selecting Nodes

XPath uses path expressions to select nodes in an XML document. The node is selected by following a path or steps. The most useful path expressions are listed below:
Expression Description
nodename Selects all child nodes of the named node
/ Selects from the root node
// Selects nodes in the document from the current node that match the selection no matter where they are
. Selects the current node
.. Selects the parent of the current node
@ Selects attributes
Below we have listed some path expressions and the result of the expressions
Path Expression Result
catalog Selects all the child nodes of the catalog element
/catalog Selects the root element catalog

Note: If the path starts with a slash ( / ) it always represents an absolute path to an element!

catalog/cd Selects all cd elements that are children of catalog
//cd Selects all cd elements no matter where they are in the document
catalog//cd Selects all cd elements that are descendant of the catalog element, no matter where they are under the catalog element
//@lang Selects all attributes that are named lang

Predicates (boolean expressions)

Path Expression Result
/catalog/cd[1] Selects the first cd element that is the child of the catalog element.

Note: IE5 and later has implemented that [0] should be the first node, but according to the W3C standard it should have been [1]!!

/catalog/cd[last()] Selects the last cd element that is the child of the catalog element
/catalog/cd[last()-1] Selects the last but one cd element that is the child of the catalog element
/catalog/cd[position()<3] Selects the first two cd elements that are children of the catalog element
//title[@lang] Selects all the title elements that have an attribute named lang
//title[@lang='eng'] Selects all the title elements that have an attribute named lang with a value of 'eng'
/catalog/cd[price>9.00] Selects all the cd elements of the catalog element that have a price element with a value greater than 9.00
/catalog/cd[price>9.00]/title Selects all the title elements of the cd elements of the catalog element that have a price element with a value greater than 9.00

Selecting Unknown Nodes

XPath wildcards can be used to select unknown XML elements.
Wildcard Description
* Matches any element node
@* Matches any attribute node
node() Matches any node of any kind
Below we have listed some path expressions and the result of the expressions
Path Expression Result
/catalog/* Selects all the child nodes of the catalog element
//* Selects all elements in the document
//title[@*] Selects all title elements which have any attribute

Selecting Several Paths

By using the | operator in an XPath expression you can select several paths.
In the table below we have listed some path expressions and the result of the expressions:
Path Expression Result
//cd/title | //cd/price Selects all the title AND price elements of all cd elements
//title | //price Selects all the title AND price elements in the document
/catalog/cd/title | //price Selects all the title elements of the cd element of the catalog element AND all the price elements in the document

Unabbreviated Location Paths

Up to this point we have used what are called abbreviated location paths. These paths are much easier to type, less verbose, and more familiar to most people. They also work best for XSLT match patterns.

XPath also offers an unabbreviated syntax for location paths that is more verbose, but perhaps less cryptic and definitely more flexible.

Every location step in a location path has two required parts, an axis and a node test, and one optional part, the predicates. The axis tells you which direction to travel from the context node to look for the next nodes. The node test tells you which nodes to include along that axis, and the predicates further winnow the nodes according to an expression.

axisname::nodetest[predicate]

In an abbreviated location path, the axis and the node test are combined, separated by a double colon ::. In our example, the abbreviated location path people/person/@id is composed of three location steps. The first step selects people element nodes along the child axis, the second selects person element nodes along the child axis, and the third selects id nodes along the attribute axis. When rewritten using the unabbreviated syntax, the same location path is child::people/child::person/attribute::id.

The unabbreviated form is verbose and not used much in practice. It isn't even allowed in XSLT match patterns. However, it does offer one crucial ability that makes it essential to know. The unabbreviated form is the only way to access most of the axes from which XPath expressions can choose nodes. The abbreviated syntax lets you walk along the child, parent, self, attribute, and descendant-or-self axes.

An axis defines a node-set relative to the current node.

AxisName Result
ancestor Selects all ancestors (parent, grandparent, etc.) of the current node
ancestor-or-self Selects all ancestors (parent, grandparent, etc.) of the current node and the current node itself
attribute Selects all attributes of the current node
child Selects all children of the current node
descendant Selects all descendants (children, grandchildren, etc.) of the current node
descendant-or-self Selects all descendants (children, grandchildren, etc.) of the current node and the current node itself
following Selects everything in the document after the closing tag of the current node
following-sibling Selects all siblings after the current node
namespace Selects all namespace nodes of the current node
parent Selects the parent of the current node
preceding Selects everything in the document that is before the start tag of the current node
preceding-sibling Selects all siblings before the current node
self Selects the current node

The following examples use the Cd catalog.

Example Result
child::cd Selects all cd nodes that are children of the current node
attribute::lang Selects the lang attribute of the current node
child::* Selects all children of the current node
attribute::* Selects all attributes of the current node
child::text() Selects all text child nodes of the current node
child::node() Selects all child nodes of the current node
descendant::cd Selects all cd descendants of the current node
ancestor::cd Selects all cd ancestors of the current node
ancestor-or-self::cd Selects all cd ancestors of the current node - and the current as well if it is a cd node
child::*/child::price Selects all price grandchildren of the current node

This XSLT Stylesheet Uses Unabbreviated XPath Syntax.

<?xml version="1.0"?> 
<xsl:stylesheet version="1.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
  <xsl:template match="/">
    <dl>
      <xsl:apply-templates select="descendant::person"/>
    </dl>
  </xsl:template>
  
  <xsl:template match="person">
    <dt><xsl:value-of select="child::name"/></dt>
    <dd>
      <ul>
        <xsl:apply-templates select="child::name/following-sibling::*"/>
      </ul>
    </dd>
  </xsl:template>
 
  <xsl:template match="*">
    <li><xsl:value-of select="self::*"/></li>
  </xsl:template>
 
 </xsl:stylesheet>

The first template matches the root node. It applies templates to all descendants of the root node that are person elements. It moves from the root node along the descendant axis with a node test of person.

The second template matches person elements. It places the value of the name child of each person element in a dt element. (The location path used here, child::name, could have been rewritten in the abbreviated syntax as the single word name.) Next, it applies templates to all elements that follow the name element at the same level of the hierarchy. It begins at the context node person element, then moves along the child axis to find the name element. From there it moves along the following-sibling axis looking for elements of any type (*) after the name element that are also children of the same person element. No abbreviated equivalent exists for the following-sibling axis, so this really is the simplest way to make the statement.

The third template matches any element not matched by another template; it simply wraps that element in an li element. The XPath self::* selects the value of the currently matched element, the context node. This expression could have been abbreviated as a single period.

This material in this section is from XML in a Nutshell by O'Reilly and w3schools.

XPath Functions

XPath provides many functions you may find useful in predicates or in raw expressions. A function returns one of the following four types:

  • boolean
  • number
  • node set
  • string

There are no void functions and booleans, numbers, and strings cannot be converted to node sets.

Besides the functions defined in XPath most users of XPath like XSLT and XPointer define more functions that are useful in their particular context. You use these extra functions just like built-in functions when you use those applications. XSLT even lets you write extension functions in Java and other languages that can do almost anything, for example, make SQL queries against a remote database server and return the result of the query as a node set.

There are four types of functions:

  • Node Set functions operate on or return information about node sets. Some examples are position (), count (), and id ().
  • String functions for basic string operations, such as finding a string's length or changing letters from uppercase to lowercase. Examples: starts-with (), ends-with (), contains (), string-length ().
  • Boolean functions are few in number and all return a boolean with the value true or false. Examples: true (), boolean (), not ()
  • Simple numeric functions used for summing groups of numbers and finding the nearest integer to a number. Examples: floor (), ceiling (), round ()

For a complete list of the functions visit:

At the end of the XSL lesson I will show you how to use the XPath functions in XSLT. Click here to see them.

Matching Elements with XSL

  1. Matching Root Node XSL, eXtensible Stylesheet Language, is an XML document and it is not really a stylesheet like CSS (Cascading Style Sheet).

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="xsl1.xsl"?>
    <greeting>hello</greeting>

    An XSL document is more like a map or a filter. It allows us to create a new document by mapping the data from an existing XML document to another XML or HTML document. For now, the conversion between XML to HTML would be the focus of our topic.

    XSL, works by matching tags and acting on the elements that was matched. The first part of the document that we need to matching is the root node whose child is: <xsl:template match="/">. Afterward the parser moves down our XML document structure.

    In our example we have created a simple one element XML document, were we have parsed the document root and have told the XSL parser to return everything (their values) within the root elements (<xsl:value-of />). In this case, it is just the word "hello".

    Click here To view the XSL.

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    
        <xsl:template match="/">
            <xsl:value-of select="."/>
        </xsl:template>
    
    </xsl:stylesheet>

  2. Value Of Everything The "value-of" tag will return the value within any tag inside the root element. In this case, our root tag is "student" and within student we have placed "first_name" and "last_name". As you can see the content within these tags are return and as the result are displayed as one contiguous text.

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="xsl1.xsl"?>
    <student>
       <first_name>Dan</first_name>
       <last_name>Cole</last_name>
    </student>

    It is worth knowing that the element is "value-of" the "xsl:" as you recall is the name space. If you have ever programmed in Java or JavaScript you have inevitably used the "object.property" syntax. If we were to follow that syntax we would have written "xsl.value-of".

    Click here To view the XSL

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    
        <xsl:template match="/">
            <xsl:value-of select="."/>
        </xsl:template>
    
    </xsl:stylesheet>

  3. Matching Other Elements In above examples we told the parser to begin at the document root and return the data inside all the tags as it parsed the document (walked down the document tree). What we could instruct the parser to do instead, is to match specific tags and perform a different action based on the match. In this example we have instructed the parser to only return the data within first_name. Notice that the relative tag has been added and inside of it are first_name and last_name.

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="xsl3.xsl"?>
    <student>
        <first_name>Dan</first_name>
        <last_name>Cole</last_name>
        <relative>
           <first_name>Scott</first_name>
           <last_name>Doe</last_name>
        </relative>
    </student>

    Below is the tree structure of the nodes. What is highlighted in yellow are the nodes that we will be selecting in this example.

    Tree Structure

    Student is at the root node. It has 3 children, first_name, last_name, and relative. Relative has 2 children, first_name and last_name. In this example all of the first_name nodes will be selected. In the xsl code we start at the root, "/", and then apply the template selecting all the first_name nodes and their children (none in this case), <xsl:apply-templates select="//first_name"/>. The "//" preceding "first_name" just means to find all occurances of "first_name". Later we will discuss this in more detail. Now that the nodes have been obtained data can be mined from them. This is done by using the match template. In this case we set the pattern to match first_name, <xsl:template match="first_name">. Once found the data of first_name is displayed using the "value-of". Note, that we see the first names of the student and the relative.

    Click here To view the XSL

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    
        <xsl:template match="/">
            <xsl:apply-templates select="//first_name"/>
        </xsl:template>
    
        <xsl:template match="first_name">
            <xsl:value-of select="."/>
        </xsl:template>
    
    </xsl:stylesheet>

  4. Displaying Last Name This is just like the previous example where we display the first name but instead we are displaying the last name. Since the first names had no spaces separating them I added the &#160; in front of where the last name is displayed. In html this is equivalent to the &nbsp; command.

    Click here To view the XSL

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    
        <xsl:template match="/">
            <xsl:apply-templates select="//last_name"/>
        </xsl:template>
    
        <xsl:template match="last_name">
            &#160;<xsl:value-of select="."/>
        </xsl:template>
    
    </xsl:stylesheet>

  5. Displaying First and Last Name

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="xsl4.xsl"?>
    <student>
        <first_name>Dan</first_name>
        <last_name>Cole</last_name>
        <relative>
           <first_name>Scott</first_name>
           <last_name>Doe</last_name>
        </relative>
    </student>

    To display the last name we need to provide another template set to match last_name. In the example above there was only one type of node, first_name, to find so it didn't matter what order the nodes were in so the recursion method, "//", could be used. If we did that here all of the first names would be output followed by all of the last names. The selection method will be different so that the output of the first and last names correspond. First, we have to instruct the parser where to start its search which is done by setting apply-templates select="/student". At this point all of the children of student will be available for matching. Traversal of the tree from this point on will match the first and last name of the student and then proceed to the first and last name of the relative.

    Click here To view the XSL

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    
        <xsl:template match="/">
            <xsl:apply-templates select="/student" />
        </xsl:template>
    
        <xsl:template match="first_name">
            <xsl:value-of select="."/>
        </xsl:template>
    
        <xsl:template match="last_name">
            <xsl:value-of select="."/>
        </xsl:template>
    
    </xsl:stylesheet>

  6. Apply Template To All The Children In the above examples we had to specify at the root node <xsl:template match="/"> the tags that the parser can only match. If a tag has numerous children, then this could become a laborious effort. To instruct the parser to search for all the child tags we could simply remove the "select" attribute of the apply-templates tag, "select="SOME PATTERN".

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="xsl5.xsl"?>
    <student>
        <first_name>Dan</first_name>
        <last_name>Cole</last_name>
        <relative>
            <first_name>Scott</first_name>
            <middle_name>Thomas</middle_name>
            <last_name>Doe</last_name>
        </relative>
    </student>

    First, we have some explaining to do. The first instruction in the XSL, <xsl:template match="/"> does not match the root tag (same as root element). What it matches is the root node. The root element is the child of the root node. Until now, we did not have to care much about this technical detail, as every tag is a descendant of the root node. We instruct the parser to search for all the direct children of the root using <xsl:apply-templates />. In order to just see the first name of the student it must be matched and all of the other nodes as well must be matched so that their values are not output.

    Click here To view the XSL

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    
        <xsl:template match="/">
            <xsl:apply-templates />
        </xsl:template>
    
        <xsl:template match="student/first_name" >
            student: <xsl:value-of select="."/>
        </xsl:template>
    
        <xsl:template match="student/last_name" >
        </xsl:template>
    
        <xsl:template match="relative">
        </xsl:template>
    
    </xsl:stylesheet>

Pattern Matching

  1. Wild Card (*)

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="xsl6.xsl"?>
    <student>
       <first_name>Dan</first_name>
       <last_name>Cole</last_name>
    </student>

    XSL, provides a powerful syntax for matching elements. You can not have pattern matching without a wild card. In this example instead of having two separate template sets for both "first_name", and "last_names" we are having one that matches both of them, <xsl:template match="*">

    Click here To view the XSL

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    
        <xsl:template match="/">
            <xsl:apply-templates select="student"/>
        </xsl:template>
    
        <xsl:template match="*">
            <xsl:value-of select="."/>
        </xsl:template>
    
    </xsl:stylesheet>

  2. First Descendent (/)

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="xsl7.xsl"?>
    <students>
       <student>
          <first_name>Dan</first_name>
          <last_name>Cole</last_name>
          <id>600322456</id>
          <homeworks>
             <homework>
                <id>HW1</id>
                <points>10</points>
             </homework>
          </homeworks>
       </student>
    </students>

    If we need to specify the first descendent we can use the "/" in our pattern syntax. Our XML document has two different "id" tags, one for the student, and the other for homework. In our XSL document we have specified the path "/students/student/id", which indicates a direct descendency from the root node (/) to the root element (students), to the first child of "students" (student) and finally to the "id" which is its direct descendent. Please note the value display on the screen.

    Click here To view the XSL

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    
        <xsl:template match="/">
            <xsl:apply-templates select="/students/student/id"/>
        </xsl:template>
    
        <xsl:template match="id">
            <xsl:value-of select="."/>
        </xsl:template>
    
    </xsl:stylesheet>

  3. Recursive Descent (//) What if we had wanted to return all the "id" elements descending from the "student" tag, regardless of their direct descendency. In this example we have used select="students/student//id". By using the "//" followed by the "id" tag then all "id" descendents of "student" are matched. Notice that the "id" just below "students" is not selected since it is not a descendent of students. Please note the value displayed as contrast to the previous example.

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="xsl8.xsl"?>
    <students>
       <id>30</id>
       <student>
          <first_name>Dan</first_name>
          <last_name>Cole</last_name>
          <id>600322456</id>
          <homeworks>
             <homework>
                <id>HW1</id>
                <points>10</points>
             </homework>
          </homeworks>
       </student>
    </students>

    Click here To view the XSL

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <xsl:template match="/"> <xsl:apply-templates select="students/student//id" /> </xsl:template> <xsl:template match="id"> <xsl:value-of select="." /> </xsl:template> </xsl:stylesheet>

  4. Current Context (.) This operator indicates current node. In this example we are only showing the "id" tag under the "homeworks". In this example we have first made a pattern match to the "homeworks" and that point we have instructed the parser to recursively search under this node ".//". Note, we are only getting the "id" under the "homeworks".

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="xsl9.xsl"?>
    <students>
       <student class_name="coin1000">
          <first_name>Dan</first_name>
          <last_name>Cole</last_name>
          <id>600322456</id>
          <homeworks>
             <homework>
                <id>HW1</id>
                <points>10</points>
             </homework>
             <homework>
                <id>HW2</id>
                <points>5</points>
             </homework>
          </homeworks>
       </student>
    </students>

    Click here To view the XSL

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <xsl:template match="/"> <xsl:apply-templates select="/students/student/homeworks" /> </xsl:template> <xsl:template match="homeworks"> <xsl:apply-templates select=".//id" /> </xsl:template> </xsl:stylesheet>

  5. Attribute Path Operator (@) So far our patterns have only been matching based on tags. Now, we are able to match base on the attribute. In this example our " select="@class" " returns the value of class attribute.

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="xsl10.xsl"?>
    <students>
       <student class="coin1000">
          <first_name>Dan</first_name>
          <last_name>Cole</last_name>
          <id>600322456</id>
          <homeworks>
             <homework>
                <id>HW1</id>
                <points>10</points>
             </homework>
          </homeworks>
       </student>
    </students>

    Click here To view the XSL

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <xsl:template match="/"> <xsl:apply-templates select="students/student" /> </xsl:template> <xsl:template match="student"> <xsl:value-of select="@class" /> </xsl:template> </xsl:stylesheet>

References

Links to XML Related Sites

  1. w3schools XSLT tutorial
  2. XML.COM
  3. WDVL XML tutorial
  4. Sun Java XML Introduction
  5. IBM'S XML Website
  6. Google Directory on XML

 


Copyright © 2009 - 2010 Robert D. Cormia - June 26, 2010