XML Mini-Tutorial
There are many problems with this approach:
the semantics is encoded into text formatting tags; l
there is no means of checking that a recipe is encoded correctly; l
it is difficult to change the layout of recipes (CSS is not enough). l
It would be much better to invent a special recipe markup language:
XML Mini-Tutorial
XML Mini-Tutorial
Michael I. Schwartzbach
Copyright © 2000 BRICS, University of Aarhus
http://www.brics.dk/~mis/ITU/XML/
What is XML?
HTML vs. XML
A conceptual view of XML
A concrete view of XML
Applications of XML
XML technologies
Namespaces
The recipe example
Schema languages
A schema for recipes
XLink, XPointer, and XPath
Pointing at recipes
XML-QL
Querying the recipes
XSLT
A style sheet for recipes
Exercises
http://www.brics.dk/~mis/ITU/XML/ [18/09/2000 14:24:26]
HTML, JavaScript, and XML Mini-Tutorials
HTML, JavaScript, and XML
Mini-Tutorials
Michael I. Schwartzbach
Copyright © 2000 BRICS, University of Aarhus
http://www.brics.dk/~mis/ITU/
These mini-tutorials are created as part of the course Internet Programming at the IT-University of
Copenhagen.
HTML (PDF)
JavaScript (PDF)
XML (PDF)
http://www.brics.dk/~mis/ITU/XML/info.html [18/09/2000 14:24:28]
XML: what is it?
What is XML?
XML is a framework for defining markup languages:
q there is no fixed collection of markup tags;
q each XML language is targeted at different application domains;
q the languages will share many features;
q there is a common set of tools for processing such languages.
XML is not a replacement for HTML:
q HTML should ideally be just another XML language;
q in fact, XHTML is just that;
XHTML is a (very popular) XML language for hypertext markup.
q
XML is designed to:
q seperate syntax from semantics;
q support internationalization (Unicode) and platform independence;
q be the future of structured information, including databases.
http://www.brics.dk/~mis/ITU/XML/whatis.html [18/09/2000 14:24:29]
XML vs. HTML
HTML vs. XML
Consider the following recipe collection published in HTML:
Rhubarb Cobbler
[email protected]
Wed, 14 Jun 95
Rhubarb Cobbler made with bananas as the main sweetener.
It was delicious. Basicly it was
2 1/2 cups diced rhubarb (blanched with boiling
water, drain)
2 tablespoons sugar
2 fairly ripe bananas sliced 1/4" round
1/4 teaspoon cinnamon
dash of nutmeg
Combine all and use as cobbler, pie, or crisp.
Related recipes: Garden Quiche
There are many problems with this approach:
q the semantics is encoded into text formatting tags;
q there is no means of checking that a recipe is encoded correctly;
q it is difficult to change the layout of recipes (CSS is not enough).
It would be much better to invent a special recipe markup language:
Rhubarb Cobbler
[email protected]
Wed, 14 Jun 95
Rhubarb Cobbler made with bananas as the main sweetener.
It was delicious.
...
http://www.brics.dk/~mis/ITU/XML/htmlvsxml.html (1 of 2) [18/09/2000 14:24:30]
XML vs. HTML
Combine all and use as cobbler, pie, or crisp.
Garden Quiche
This example illustrates:
q the markup tags are chosen purely for logical structure;
q this is just one choice of markup detail level;
q we need a kind of "grammar" for XML recipe collections;
q we need a stylesheet to define presentation semantics.
http://www.brics.dk/~mis/ITU/XML/htmlvsxml.html (2 of 2) [18/09/2000 14:24:30]
XML: a conceptual view
A conceptual view of XML
An XML document is a labeled tree.
q a leaf node is
r character data (a text string) - the actual data,
r a processing instruction - annotations for various processors, typically in document
header,
r a comment - never any semantics attached,
r an entity declaration - simple macros.
q an internal node is an element, which is labeled with
r a name, and
r a set of attributes, each consisting of a name and a value.
Often, comments and entity declarations are not explicitly represented in the tree.
http://www.brics.dk/~mis/ITU/XML/conceptual.html [18/09/2000 14:24:31]
XML: a concrete view
A concrete view of XML
An XML document is a (Unicode) text with markup tags and other meta-information.
Markup tags denote elements:
.........
| | | |
| | | a matching element end tag
| | the contents of the element
| an attribute with name attr and value val, values enclosed by ' or "
an element start tag with name foo
There is a short-hand notation for empty elements: ......
Note: XML is case sensitive!!
An XML document must be well-formed:
q start and end tags must match;
q element tags must be properly nested;
q and some more subtle syntactical requirements.
Special characters can be escaped using Unicode character references:
q & yields &;
q < and < both yield ]]>
The strange syntax is a legacy from SGML...
The following service checks well-formedness of an XML document (given a full URL):
process clear
http://www.brics.dk/~mis/ITU/XML/concrete.html [18/09/2000 14:24:32]
XML: applications
Applications of XML
There are already hundreds of serious applications of XML.
XHTML
W3C's XMLization of HTML 4.0. Example XHTML document:
Hello world!
foobar
CML
Chemical Markup Language. Example CML document snippet:
C O H H H H
-0.748 0.558 -1.293 -1.263 -0.699 0.716
WML
Wireless Markup Language for WAP services:
Hello World
There is a long list of many other XML applications.
http://www.brics.dk/~mis/ITU/XML/applications.html [18/09/2000 14:24:33]
XML: technologies
XML technologies
Just a notation for trees is not enough:
q the real force of XML is generic languages and tools!
The XML vision offers:
namespaces
- to avoid name clashes when a document uses several "sub-languages";
schemas
- grammars to define classes of documents;
linking between documents
- a generalization of HTML anchors and links;
addressing parts of documents
- it is not enough that only the author can place anchors;
transformation
- conversion from one document class to another;
querying
- extraction of information.
The site www.xmlsoftware.com has a comprehensive list of available XML tools.
http://www.brics.dk/~mis/ITU/XML/tech.html [18/09/2000 14:24:34]
XML: namespaces
Namespaces
Consider an XML language WidgetML which uses XHTML as a sublanguage for help messages:
Description of gadget
Gadget
A gadget contains a big gizmo
We have some problems here:
q the meaning of head and big depends on the context;
q this complicates things for processors and might even cause ambiguities;
q the root of the problem is: one common name-space.
The solution is to introduce explicit namespace declarations:
Description of gadget
Gadget
A gadget contains a big gizmo
Do not be confused by the use of URI for namespaces:
q they are not supposed to point to anything;
q it is simply the cheapest way of getting unqiue names;
http://www.brics.dk/~mis/ITU/XML/namespaces.html (1 of 2) [18/09/2000 14:24:35]
XML: namespaces
qwe rely on existing organizations that control domain names.
All XML technologies (are supposed to) respect namespaces.
http://www.brics.dk/~mis/ITU/XML/namespaces.html (2 of 2) [18/09/2000 14:24:35]
XML: recipe example
The recipe example
Consider the following raw data describing some (Danish) recipes:
q citrontærte;
q farsbrød;
q hornfisk;
q islagkage;
q laksemousse;
q nougattoppe;
q rabarberdessert;
q smørrebrød.
We can represent this collection as an XML document.
http://www.brics.dk/~mis/ITU/XML/recipe.html [18/09/2000 14:24:35]
XML: schemas
Schema languages
The syntax of a new XML language must be formalized:
q this is similar to the formal syntax of a programming language;
q however, usual context-free grammars are not expressive enough;
q XML languages are described using schemas.
A modern schema language:
q is itself an XML language (and can be used to describe itself);
q imposes constraints on the contents of elements;
q is context-sensitive and very fine-grained;
q can be processed efficiently.
A schema processor:
q checks that an application document satisfies the schema;
q such a document is called valid.
http://www.brics.dk/~mis/ITU/XML/schemas.html [18/09/2000 14:24:36]
XML: schema for recipes
A schema for recipes
The following is a complete schema for the recipe example, written in the DSD schema language:
http://www.brics.dk/~mis/ITU/XML/schemarecipe.html (1 of 3) [18/09/2000 14:24:37]
XML: schema for recipes
http://www.brics.dk/~mis/ITU/XML/schemarecipe.html (2 of 3) [18/09/2000 14:24:37]
XML: schema for recipes
http://www.brics.dk/~mis/ITU/XML/schemarecipe.html (3 of 3) [18/09/2000 14:24:37]
XML: XLink, XPointer, and XPath
XML: XLink, XPointer, and XPath
XLink, XPointer, and XPath are three related mechanisms:
qthey generalize the link mechanisms from HTML;
q XPath points from without to a set of nodes in an XML document;
q XPointer uses XPath to directly generalize HTML links;
q XLink uses XPointer to vastly generalize HTML links.
HTML links are just too simple:
q an anchor must be placed at every link destination (problem with read-only documents)
- we want to express relative locations;
q the link definition must be at the same location as the link source
- we want out-of-line links ("link databases");
q only individual nodes can be linked to
- we want links to whole tree fragments;
q a link always has one source and one destination
- we want links with multiple sources and destinations.
The XLink pointer model looks like this:
These technologies are not really supported by any browsers today.
http://www.brics.dk/~mis/ITU/XML/xpath.html [18/09/2000 14:24:38]
XML: pointing at recipes
Pointing at recipes
The following simple XPath expressions point to parts of the XML recipe document:
//ingrediens[@navn="radiser i små tern"]/@antal
200
//ingrediens[@antal="100" and @enhed="g"]/@navn
flødeost med løg og urter
blødt smør i mindre stykker
Feta ost 45+
smeltet overtrækschokolade
//titel[text()="Citrontærte"]
/following-sibling::ingrediens[@navn="dej"]/tilberedning/text()
Bland mel og sukker i en skål. Skær smørret i mindre stykker og
smuldr det i melblandingen, til den ligner revet ost. Tilsæt vand
og saml hurtigt dejen. Tryk den ud i en smurt springform (ca. 22 cm
i diameter). Lad dejen gå halvt op ad formens side. Stil den
tildækket i køleskabet i mindst 1 time. Forbag bunden midt i ovnen
i 12 minutter ved 200 grader.
XPath expressions navigate step by step through the XML tree.
http://www.brics.dk/~mis/ITU/XML/xpathrecipe.html [18/09/2000 14:24:39]
XML: XML-QL
XML-QL
XML-QL is a query language for XML documents:
q XML document can be seen as generalizations of database relations;
q XML-QL is a similar generalization of SQL;
q it can extract data from exisiting XML documents and construct new XML documents.
Relations are special, restricted cases of XML trees:
XML query languages are not released until 2001.
http://www.brics.dk/~mis/ITU/XML/xmlql.html [18/09/2000 14:24:40]
XML: querying the recipes
Querying the recipes
The following XML-QL queries extract information from the XML recipe document:
WHERE
$t
IN "karoline.xml"
CONSTRUCT $t