XML

 


COIN78 - XML WEEK 10: XLink, XPointer, and XML Sitemaps



XLink

XLink defines a standard way of linking XML documents. XLink provides two kinds of hyperlinking for use in XML documents. Simple links offer similar functionality to HTML links. Extended links define a relationship between two or more documents. XLink can link a document to an external link database (linkbase) that contains a list of links and the linkbase can be loaded automatically.

XPath Graph

Image from w3schools.com

 

The model defined in this specification shares with HTML the use of URI technology, but goes beyond HTML in offering features, previously available only in dedicated hypermedia systems, that make hyperlinking more scalable and flexible. Along with providing linking data structures, XLink provides a minimal link behavior model; higher-level applications layered on XLink will often specify alternate or more sophisticated rendering and processing treatments.

Below is a table of the XLink Attributes.

Attribute Value Description
xlink:actuate onLoad
onRequest
other
none
Defines when the linked resource is read and shown
xlink:href URL The URL to link to
xlink:show embed
new
replace
other
none
Where to open the link. Replace is default
xlink:type simple
extended
locator
arc
resource
title
none
simple link
an extended, possibly multi-resource, link
a pointer to an external resource
a traversal rule between resources
an internal resource
a descriptive title for another linking element

In the example below we are creating two simple links which means to 'click' from here to go there. We must include the XLink namespace: http://www.w3.org/1999/xlink. The attribute show means to open the link in a new window after it has been clicked.

<?xml version="1.0"?>

<homepages xmlns:xlink="http://www.w3.org/1999/xlink">

    <homepage xlink:type="simple"
              xlink:href="http://www.w3schools.com">Visit W3Schools
              xlink:show="new"
    </homepage>

    <homepage xlink:type="simple"
              xlink:href="http://www.w3.org">Visit W3C
    </homepage>

</homepages>    

The attribute xlink:show may also be set to embed which means that the resource will be processed inline within the page. if this resource is another XML document then this provides a way of building a hierarchy of XML documents.

You may also specify when the resource should appear by setting the xlink:actuate attribute. When set to onload the resource will appear when the document page is loaded. When set to onRequest the resource is not read or shown before the link is clicked.

Extended types are much more complex.They are marked by the type "extended" and may contain locators (pointing to remote resources), local resources, arcs, and a title. They may act like wrappers that provide a nest for resources and arcs. It is possible to create a one-to-many link, something previously not possible in HTML. At XML.com there is a great article written by Fabio Arciniegas A. that describes this in more detail.

XPointer

XPointer allows the hyperlinks to point to more specific parts in the XML document.

XPath Graph

Image from w3schools.com

 

 

XPointer allows you to point to more specific content within a document. The XLink points to an XML document, we can add an XPointer after the URL in the xlink:href attribute, to navigate (with an XPath expression) to a specific place in the document. This is like using a named anchor in html. For example, in an XML file there is a list of dog breeds each with a unique id. This id will act as the anchor to which the XPointer will point. One of the dogs has an id="Rottweiler". So the xlink:href attribute would look like this:

xlink:href="http://dog.com/dogbreeds.xml#xpointer(id('Rottweiler'))

However, XPointer allows a shorthand form when linking to an element with an id. You can use the value of the id directly, like this:

xlink:href="http://dog.com/dogbreeds.xml#Rottweiler"

In the example below we use XPointer to point to the fifth item in a list with a unique id of "rock":

href="http://www.example.com/cdlist.xml#id('rock').child(5,item)"


XInclude

XInclude is a generic mechanism for merging XML documents, by writing inclusion tags in the "main" document to automatically include other documents or parts thereof. The syntax leverages existing XML constructs - elements, attributes, and URI references.

XInclude differs from the linking features described in the [XML Linking Language], specifically links with the attribute value show="embed". Such links provide a media-type independent syntax for indicating that a resource is to be embedded graphically within the display of the document. XLink does not specify a specific processing model, but simply facilitates the detection of links and recognition of associated metadata by a higher level application.

XInclude, on the other hand, specifies a media-type specific (XML into XML) transformation. It defines a specific processing model for merging information sets. XInclude processing occurs at a low level, often by a generic XInclude processor which makes the resulting information set available to higher level applications.

The resulting document becomes a single composite XML Information Set. For example, including the text file license.txt:

<?xml version="1.0"?>
...
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:xi="http://www.w3.org/2001/XInclude">
   <head>...</head>
   <body>
      ...
      <p><xi:include href="license.txt" parse="text"/></p>
   </body>
</html>

gives:

<?xml version="1.0"?>
...
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:xi="http://www.w3.org/2001/XInclude">
   <head>...</head>
   <body>
      ...
      <p>This document is published under GNU Free Documentation License</p>
   </body>
</html>

The mechanism is similar to HTML's <object> tag (which is specific to the HTML markup language), but the XInclude mechanism works with any XML format, such as SVG and XHTML.

This section is from Wikipedia.


Introduction to XML Sitemaps

XML sitemaps are a simple catalog, written in XML, which serves as an inventory of pages in a website, when they were created, how often they are changed, and the priority of their indexing by search engines. These documents are read by search engine robots to quickly determine the pages to be indexed, and how often a search engine robot should reindex the site. An XML site map's value is to improve the page ranking within a search engine. The XML sitemap protocol is described at http://www.sitemaps.org/protocol.php

Google introduced Google Sitemaps so web developers can publish lists of links from across their sites. The basic premise is that some sites have a large number of dynamic pages that are only available through the use of forms and user entries. The site map files can then be used to indicate to a web crawler how such pages can be found. Google, MSN and Yahoo now jointly support the Google sitemaps protocol and the sitemps XML protocol.

Section 1

  1. Google XML site maps are an inventory of a website's pages, and assets including PDF files, written in XML. This file, which is almost always written with the file name sitemap.xml, contains a list of URLs, the date last modified, change frequency, and the priority of indexing.
  2. Basic structure (XML code from http://en.wikipedia.org/wiki/Google_Sitemaps) A sample site map with a single recordset is shown here. It is very easy to code. The full XML element schema is documented at http://www.sitemaps.org/protocol.php
  3. XSD schema for XML sitemaps are required. The schema for Google's XML site map model is shown here.

Section 2

  1. How to make a Google XML site map. You can hand code it, or you can use a free service, such as XML Site Maps.com http://www.xml-sitemaps.com/ You simply type in the URL of your website, and in a matter of seconds, a site map is produced. Save it as sitemap.xml.
  2. How to make a Yahoo! XML site map - use the same service at http://www.xml-sitemaps.com/ but you will build a ROR file essentially the same way. A description of the process for creating ROR files can be found at ROR Web http://www.rorweb.com/create-ror-file-rss.htm An example of a ROR file, ROR.xml , shows the basic structure, which looks like metadata, and XML site map, and an RSS channel all combined. But you can also add inventory to a ROR file, which makes it far more powerful, as shown in this file.
  3. Easy ways to hand code your own document - just follow the samples above, and create a checklist so that you don't forget a page.

Section 3

  1. Installing your site map. This requires you to be a Google webmaster. Sign up at this URL - http://www.google.com/webmasters/
  2. Letting Google know it's there. You need to upload the sitemap.xml file to your website, then let Google know you are ready for validation. Log into your Google service (you must have a Google account - it's easy, fast, and free) http://www.google.com/webmasters/sitemaps/. Follow the instructions, which require you to load a second blank HTML file with a special code (which is in the filename) ensuring that you are the responsible party for maintaining the website. Then log back into your webmaster account to see if Google has indexed your site map file.
  3. How often to redo your site map? In a few weeks you will see your site ranking improve, but keep your site map file current. You can see how often your sitemap.xml file, and your website, has been crawled by inspecting your site statistics in http://www.google.com/webmasters/.

Homework

  1. Build a Google site map - by hand if you can, or at least study the one generated by a site map tool
  2. Install the site map, and ensure that Google has indexed it. You will submit a link that looks like http://www.mysite.com/sitemap.xml as your homework.
  3. Visit the webmaster tools site frequently, and make use of the other tracking tools which can help you optimize website visibility in search engines.

References


Links to XML Related Sites

  1. XML.COM
  2. WDVL XML tutorial
  3. Sun Java XML Introduction
  4. IBM'S XML Website
  5. Google Directory on XML

 


Copyright © 2009 - 2010 Robert D. Cormia - May 14, 2009