XML

 


COIN78 - XML WEEK 2: XML Syntax and Structure



Your First XML Document

In our first week coding XML you will build a simple address book, with 5 or so records, and use a total of 20 elements. It will follow the rules of XML syntax to be well formed, and with our DTD next week, be valid as well. The file is already built, so I'm hoping that you will modify this one, or embark on your own theme.

  1. Your first xml file The first thing that we need to do is create an XML file, save it and the open it in IE6 (or IE 7). Use any favorite text editor (BBEdit or Notepad) to type your xml document, or XML Pad. In our file we will type <name> </name> and then we save the file. In an XML document the extension of the file needs to be ".xml". After saving the file, go ahead and open it with MSIE, as you would open an HTML file. MSIE is trying to show it understands XML by displaying the tags in color. Why does it display "<name />" ? We will talk about that later.

    <?xml version="1.0" encoding="UTF-8"?>  
    <address_book></address_book>
          

  2. Adding more You just saw how simple it was to create and view an XML document without having really talked about any of the specifics yet. In this file, we have made a more elaborate XML document. We've actually started to build out the address_book.xml file. Later on, we will explain some of these concepts. In the example we have added information about the name.

    <?xml version="1.0" encoding="UTF-8"?>  
    <record>
    <name>
    <first_name>first name</first_name>
    <middle_name>middle name</middle_name>
    <last_name>last name</last_name>
    <nick_name>nick name</nick_name>
    </name>

    </record>

    In the next section we add the address information.

    <?xml version="1.0" encoding="UTF-8"?>
    <!--address book using nested elements-->
    <record>
    <name>
    <first_name>first name</first_name>
    <middle_name>middle name</middle_name>
    <last_name>last name</last_name>
    <nick_name>nick name</nick_name>
    </name>
    <address>
    <street_address>street address goes here</street_address>
    <street_address_detail>apartment number goes here</street_address_detail> <city>city goes here</city>
    <state>state goes here</state>
    <zipcode>zipcode goes here</zipcode>
    </address>
    </record>

    In the next section we add the contact information.

    <?xml version="1.0" encoding="UTF-8"?>
    <!--address book using nested elements-->
    <record>
    <name>
    <first_name>first name</first_name>
    <middle_name>middle name</middle_name>
    <last_name>last name</last_name>
    <nick_name>nick name</nick_name>
    </name>
    <address>
    <street_address>street address goes here</street_address>
    <street_address_detail>apartment number goes here</street_address_detail> <city>city goes here</city>
    <state>state goes here</state>
    <zipcode>zipcode goes here</zipcode>
    </address>
    <contact>
    <home_phone>home phone goes here</home_phone>
    <work_phone>work phone goes here</work_phone>
    <cell_phone>cell phone goes here</cell_phone>
    <fax_number>fax number goes here</fax_number>
    <email_address>email address goes here</email_address>
    </contact>
    </record>

    In the next section we add the comment information.

    <?xml version="1.0" encoding="UTF-8"?>
    <!--address book using nested elements-->
    <record>
    <name>
    <first_name>first name</first_name>
    <middle_name>middle name</middle_name>
    <last_name>last name</last_name>
    <nick_name>nick name</nick_name>
    </name>
    <address>
    <street_address>street address goes here</street_address>
    <street_address_detail>apartment number goes here</street_address_detail>
    <city>city goes here</city>
    <state>state goes here</state>
    <zipcode>zipcode goes here</zipcode>
    </address>
    <contact>
    <home_phone>home phone goes here</home_phone>
    <work_phone>work phone goes here</work_phone>
    <cell_phone>cell phone goes here</cell_phone>
    <fax_number>fax number goes here</fax_number>
    <email_address>email address goes here</email_address>
    </contact>
    <comments>
    <misc_comments>comments go here</misc_comments>
    </comments>
    </record>

    Now that we have all the information defined for one person we need to be able to add more people to build our address book. To do that we will add a record for each person that we add. To distinquish between records we assign an ID to a unique number.

    <?xml version="1.0" encoding="UTF-8"?>
    <!--address book using nested elements-->
    <address_book>
    <record ID="1">
    <name>
    <first_name>first name</first_name>
    <middle_name>middle name</middle_name>
    <last_name>last name</last_name>
    <nick_name>nick name</nick_name>
    </name>
    <address>
    <street_address>street address goes here</street_address>
    <street_address_detail>apartment number goes here</street_address_detail> <city>city goes here</city>
    <state>state goes here</state>
    <zipcode>zipcode goes here</zipcode>
    </address>
    <contact>
    <home_phone>home phone goes here</home_phone>
    <work_phone>work phone goes here</work_phone>
    <cell_phone>cell phone goes here</cell_phone>
    <fax_number>fax number goes here</fax_number>
    <email_address>email address goes here</email_address>
    </contact>
    <comments>
    <misc_comments>comments go here</misc_comments>
    </comments>
    </record>
    <record ID="2">
    <name>
    <first_name>first name</first_name>
    <middle_name>middle name</middle_name>
    <last_name>last name</last_name>
    <nick_name>nick name</nick_name>
    </name>
    <address>
    <street_address>street address goes here</street_address>
    <street_address_detail>apartment number goes here</street_address_detail> <city>city goes here</city>
    <state>state goes here</state>
    <zipcode>zipcode goes here</zipcode>
    </address>
    <contact>
    <home_phone>home phone goes here</home_phone>
    <work_phone>work phone goes here</work_phone>
    <cell_phone>cell phone goes here</cell_phone>
    <fax_number>fax number goes here</fax_number>
    <email_address>email address goes here</email_address>
    </contact>
    <comments>
    <misc_comments>comments go here</misc_comments>
    </comments>
    </record>
    </address_book>

Commenting

  1. Comments One of the first concepts a person should learn about a language, is the commenting syntax a language uses. The XML comments should be familiar to you, as it is almost the same as HTML syntax of commenting. (Actually these are SGML comments, as you will read later in your book)

    <?xml version="1.0" encoding="UTF-8"?>  
    <!-- This is the start of an address book in XML -->
        

  2. Except... You can not use "--" inside the comments. If you like to have long lines of comments as easy visual markers,
    "<!-- -----------SOME INFO--------- -->" you can not use "--" use "=="
    instead: "<!-- ============SOME INFO=========== -->"
  3. Some possible Alternatives Use any of these commenting lines.

    <?xml version="1.0" encoding="UTF-8"?>  
    <!-- ============= an address book in XML =========== -->
    <record>
    <name>
    <first_name>first name</first_name>
    <middle_name>middle name</middle_name>
    <last_name>last name</last_name>
    <nick_name>nick name</nick_name>
    </name>
    </record>
    <!-- ____________ an address book in XML ___________ -->

 

Well-Formed XML Document

  1. Root Element For a document to be well-formed it must follow the XML recommendation. Basically a document must have a root element. As an example "<html>" is the root element in an HTML document."<address_book>", is our root element. The next guideline is the order in which to open and close.

    <?xml version="1.0" encoding="UTF-8"?>
    <address_book></address_book>
        

  2. One and only one... root element is allowed. A document can not contain two or more root elements. In this example there are two instances of the "<address_book>"</address_book>". You can take a look at the source code of this document if you need to see the code itself. Your browser should return an error.

    <?xml version="1.0" encoding="UTF-8"?>  
    <address_book>  </address_book>  
    <address_book>  </address_book>
    
    When rendered an error is generated:
    
    XML Parsing Error: junk after document element
    Location: http://fgamedia.org/faculty/rdcormia/COIN78/files/xmlExamples/xml6.xml
    Line Number 4, Column 1: <address_book>
    ^

  3. FOLE First Open Last End. This is a play on the programming term FILO (First In Last Out). Another guideline which an XML document needs to meet is the order of the end elements, FOLE. Basically, you are required to preserve the symmetric of element nesting. The first opening element is the last closing element. Or close last what you opened first, and close first what you opened last. Whatever. The best example is the root element; it is the first element and it is last ending element.

    <?xml version="1.0" encoding="UTF-8"?>
    <address_book>
    <record>
    <name>
    <first_name>first name goes here</first_name>
    <middle_name>middle name goes here</middle_name>
    <last_name>last name goes here</last_name>
    <nick_name>nick name goes here</nick_name>
    </name>
    </record>
    </address_book>

  4. Do not try this at home In this simple example, the order of "</names>" and "</name>" were switched. This would cause any XML parser to choke. Actually, you will get a polite error statement about a mismatch, including the line and position number of the first "offense". For those of you using DreamWeaver to code your XML (never too early), the yellow tag error marker will also highlight the mismatch.

    <address_book>     
        <record>
            <names>
               <name>#</names>
            </name>
        </record>
    </address_book>  
    
    When rendered an error is generated:
    
    
    XML Parsing Error: mismatched tag. Expected: </name>.
    Location: http://fgamedia.org/faculty/rdcormia/COIN78/files/xmlExamples/xml8.xml
    Line Number 4, Column 19: <name>#</names>
    ------------------^

  5. alpha under case-no "xml" An XML tag must start with an alphabetic character (a-z), or underscore(_), and the tags are case-sensetive. An XML tag can not begin with the character combination of "xml"- it's a reserved name (we'll talk about reserved names later). By the way....

    <?xml version="1.0" encoding="UTF-8"?>
    <name>
    <_firstname>first name goes here</_firstname>
    <middle_name>middle name goes here</middle_name>
    <last_name>last name goes here</last_name>
    <nick.name>nick name goes here</nick.name>
    <!--
    a period "." can be used in the element name - but not at the beginning
    -->
    </name>

  6. Case matters! Unlike HTML, XML is fully case-sensitive. In this example the closing element for "<record>", "</record>" was replaced by "</Record>". To your computer, ASCII characters are all that matter. An upper case "A" is no more similar to a lower case "a" than it is to "z". Try to stay lower case as a good habit.

    <address_book>
        <record ID="1">
            <name>
      	       <first_name>first name goes here</first_name>
      	       <middle_name>middle name goes here</middle_name>
      	       <last_name>last name goes here</last_name>
      	       <nick_name>nick name goes here</nick_name> 
            </name>
        </Record>
    </address_book>
    
    When rendered an error is generated:
    
    XML Parsing Error: mismatched tag. Expected: </record>.
    Location: http://fgamedia.org/faculty/rdcormia/COIN78/files/xmlExamples/xml10.xml Line Number 12, Column 4: </Record>
    ----------^

 

Anatomy of an XML Document

  1. Attribute Like HTML, the elements of an XML document can support attributes. Attributes extend an element's capacity of structuring a document by packing additional information about that element.

    <?xml version="1.0" encoding="UTF-8"?>
    <address_book>
    <record ID="1">
    <!--
    the "ID" refers to a record number. The value of that record is "1"
    -->

  2. Entity References An Entity is an instruction that the XML parser substitutes after parsing the document. Entities are not new at all. The page you are viewing is using the entity "&lt;" to display the "<". This is because the HTML tag, and XML tags as well, are indicated by being encompassed within "< >" characters. Therefore, in this case, it would be impossible to display them on the screen without the parser first parsing them. The XML language has 5 built in entity references (&lt; <), (&gt; >), (&amp; &), (&apos; &apos;), and (&quot; "). These entity references are derived from SGML, hence their appearance in HTML. The snippet below shows what we would like our data to look like:

    <?xml version="1.0" encoding="UTF-8"?>
    <quiz>
        <question type="multiple" number="1">
            Which is (are) the root element of an HTML document?      
            <answers>
                <answer choice="a"><HTML></answer>
                <answer choice="b"></HTML></answer>
                <answer choice="c"><HTML> & <body></answer>
                <answer choice="d" correct="true"><HTML> & </HTML></answer>
            </answers>
        </question>
    </quiz>    

    In order for our data to look this way it needs to be written as:

    <?xml version="1.0" encoding="UTF-8"?>
      <quiz>
        <question type="multiple" number="1">
          Which is (are) the root element of an HTML document?
          <answers>
            <answer choice="a">&lt;HTML&gt;</answer>
            <answer choice="b">&lt;/HTML&gt;</answer>
            <answer choice="c">&lt;HTML&gt; &amp; &lt;body&gt;</answer>
            <answer choice="d" correct="true">&lt;HTML&gt; &amp; &lt;/HTML&gt;</answer>
          </answers>
        </question>
      </quiz>

  3. CDATA? PCDATA? So what type of stuff can be placed within an element? Here we come across this CDATA PCDATA stuff. For starters, any text placed within the elements are by default of type Parsed Character DATA (PCDATA). This means the data will be parsed by the XML parser. In contrast to PCDATA, is the plain old Character DATA (CDATA), data that is not parsed by the parser. As you remember in our Entity example we had to use the &lt; entity to encode the text HTML to make the parser replace it with <. But, CDATA (data) does not get parsed so there is no reason to use any entities. If you are very detailed you probably noticed that there is some white space within the HTML and BODY in the choice "c". The white space in CDATA is preserved since the parser never parses this data and therefore the white space is not converted to a single white space as it would normally would be.

    <question type="multiple" number="2">
        Which is not a form of living matter?
        <answers>                 
            <answer choice="a" ><![CDATA[animal]]></answer>
            <answer choice="b"><![CDATA[plant]]></answer>
            <answer choice="c" ><![CDATA[bacteria]]></answer>
            <answer choice="d" correct="true"><![CDATA[minerals]]></answer>
        </answers>
    </question>  

3 Ways to Hold Data

  1. Nested Elements (tags only).

    Nested elements are used to hold child elements, large blocks of text. Humans tend to code with (nested) elements.

    In the example below we have elements holding data and other elements referred to as child elements. The data for the <first_name> element is Robert. The element <contact> holds 4 other elements, <first_name>, <last_name>, <nick_name>, and <email_address>.


    <contact>
        <first_name>Robert</first_name>
        <last_name>Cormia</last_name>
        <nick_name>Carbon Bob</nick_name>
        <email_address>rdcormia@earthlink.net</email_address>
    </contact>
    

  2. Empty Elements (attributes)

    Elements may contain attributes which are name/value pairs. You have seen them in the <meta> and the <img> tags. Attributes add 'granularity' to the definition (or description) of data (see mixed elements below) and are usually written by machines. Notice that empty elements close themselves.

    <meta name="" value="" />
    
    <meta name="description" value="description of the document" />
    <meta name="keywords" value="keywords in the document" />
    <meta name ="author" value="author of the document" />
    <meta name ="copyright" value="copyright of the document" />
    <meta name="fears" value="spiders, snakes, insects" />
    <meta name="aptitude" value="interpersonal, instruction, counseling" />
    
    <img src="../../images/notes_xml.jpg" width="100" height="100" alt="XML" />
          

  3. Mixed Elements (tags and attributes)

    Elements may contain either attributes and text or attributes and other elements. Notice that mixed elements do NOT close themselves.

    <name language="English">Cat</name>
    <name language="Latin">Cattus</name>
    
    <weight units="pounds">150</weight>
    <weight units="kilograms">68.2</weight>      

Example Files

Homework

Go to the week 3 tutorial, then, create two XML models, one nested, and one empty.

Email these files to me as attachments as descirbed in assignment one to rdcormia@earthlink.net.

I encourage you not to build an address book as your 'theme' but if use address_book_nested.xml and address_book_empty.xml files as a start, but please make significant changes to the model.

 

Links to XML Related Sites

  1. XML.COM
  2. WDVL XML tutorial
  3. Sun Java XML Introduction
  4. IBM'S XML Website
  5. Google Directory on XML

 


Copyright © 2009 - 2010 Robert D. Cormia - October 15, 2009