Ads

LightBlog

Latest

Saturday, April 17, 2021

SGML AND XML

 

SGML


Standard generalized markup language (SGML) is a text markup language that serves as a superset of widely used markup languages like HTML (hypertext markup language) and XML (extensible markup language). 



SGML is used for marking up documents and has the advantage of not being dependent on a specific application. It is derived from GML (generalized markup language), which allowed users to work on standardized formatting styles for electronic documents.

         Standard generalized markup language features the following characteristics:
        Descriptive Markup
        Document Types
Descriptive markup involves the use of markup code that identify how various portions of a document should be interpreted. For example, the code may identify one portion as a paragraph, another as a footnote and still another as a list or an item in a list.

Any software capable of processing the marked-up document will then do so using its own kind of rendering. For example, one application might gather portions identified as footnotes and print them out at the end of each page. Another might print footnotes at the end of each chapter. Still another might not print out the footnotes at all.

Another important characteristic of standard generalized markup language is its use of document types, and subsequently it use of document type definition (DTD). A particular document type is expected to have specific parts and a specific structure. For example, when there is a DTD for a report, the portions and structure of the document should follow what is defined in the DTD for it to be considered a report. One major benefit is that documents with the same type can be processed uniformly by all software capable of processing them.

Overview of XML


XML stands for Extensible Markup Language. It is a text-based markup language derived from Standard Generalized Markup Language (SGML).
XML tags identify the data and are used to store and organize the data, rather than specifying how to display it like HTML tags, which are used to display the data. XML is not going to replace HTML in the near future, but it introduces new possibilities by adopting many successful features of HTML.
There are three important characteristics of XML that make it useful in a variety of systems and solutions −
       XML is extensible − XML allows you to create your own self-descriptive tags, or language, that suits your application.
       XML carries the data, does not present it − XML allows you to store the data irrespective of how it will be presented.
       XML is a public standard − XML was developed by an organization called the World Wide Web Consortium (W3C) and is available as an open standard.



XML Usage

A short list of XML usage says it all −
       XML can work behind the scene to simplify the creation of HTML documents for large web sites.
       XML can be used to exchange the information between organizations and systems.
       XML can be used for offloading and reloading of databases.
       XML can be used to store and arrange the data, which can customize your data handling needs.
       XML can easily be merged with style sheets to create almost any desired output.
       Virtually, any type of data can be expressed as an XML document.

What is Markup?


XML is a markup language that defines set of rules for encoding documents in a format that is both human-readable and machine-readable. So what exactly is a markup language? Markup is information added to a document that enhances its meaning in certain ways, in that it identifies the parts and how they relate to each other. More specifically, a markup language is a set of symbols that can be placed in the text of a document to demarcate and label the parts of that document.
Following example shows how XML markup looks, when embedded in a piece of text −
<message>
   <text>Hello, world!</text>
</message>
This snippet includes the markup symbols, or the tags such as <message>...</message> and <text>... </text>. The tags <message> and </message> mark the start and the end of the XML code fragment.
The tags <text> and </text> surround the text Hello, world!.

Is XML a Programming Language?

A programming language consists of grammar rules and its own vocabulary which is used to create computer programs. These programs instruct the computer to perform specific tasks. XML does not qualify to be a programming language as it does not perform any computation or algorithms. It is usually stored in a simple text file and is processed by special software that is capable of interpreting XML.


Syntax


XML Declaration

The XML document can optionally have an XML declaration. It is written as follows −
<?xml version = "1.0" encoding = "UTF-8"?>
Where version is the XML version and encoding specifies the character encoding used in the document.

Syntax Rules for XML Declaration

       The XML declaration is case sensitive and must begin with "<?xml>" where "xml" is written in lower-case.
       If document contains XML declaration, then it strictly needs to be the first statement of the XML document.
       The XML declaration strictly needs be the first statement in the XML document.
       An HTTP protocol can override the value of encoding that you put in the XML declaration.

Tags and Elements

An XML file is structured by several XML-elements, also called XML-nodes or XML-tags. The names of XML-elements are enclosed in triangular brackets < > as shown below −
<element>

Syntax Rules for Tags and Elements

Element Syntax − Each XML-element needs to be closed either with start or with end elements as shown below −
<element>....</element>
or in simple-cases, just this way −
<element/>
Nesting of Elements − An XML-element can contain multiple XML-
elements as its children, but the children elements must not overlap. i.e., an end tag of an element must have the same name as that of the most recent unmatched start tag.
The Following example shows incorrect nested tags −
<?xml version = "1.0"?> <contact-info>
<company>TutorialsPoint
<contact-info>
</company>
The Following example shows correct nested tags −
<?xml version = "1.0"?>
<contact-info>
   <company>TutorialsPoint</company>
<contact-info>
Root Element − An XML document can have only one root element. For example, following is not a correct XML document, because both the x and yelements occur at the top level without a root element −
<x>...</x>
<y>...</y>
The Following example shows a correctly formed XML document −
<root>
   <x>...</x>
   <y>...</y>
</root>
Case Sensitivity − The names of XML-elements are case-sensitive. That means the name of the start and the end elements need to be exactly in the same case.
For example, <contact-info> is different from <Contact-Info>

XML Attributes

An attribute specifies a single property for the element, using a name/value pair. An XML-element can have one or more attributes. For example −
<a href = "http://www.tutorialspoint.com/">Tutorialspoint!</a>
Here href is the attribute name and http://www.tutorialspoint.com/ is attribute value.

Syntax Rules for XML Attributes

       Attribute names in XML (unlike HTML) are case sensitive. That is, HREFand href are considered two different XML attributes.
       Same attribute cannot have two values in a syntax. The following example shows incorrect syntax because the attribute b is specified twice
<a b = "x" c = "y" b = "z">....</a>
       Attribute names are defined without quotation marks, whereas attribute values must always appear in quotation marks. Following example demonstrates incorrect xml syntax
<a b = x>....</a>
In the above syntax, the attribute value is not defined in quotation marks.

XML References

References usually allow you to add or include additional text or markup in an XML document. References always begin with the symbol "&" which is a reserved character and end with the symbol ";". XML has two types of references −
       Entity References − An entity reference contains a name between the start and the end delimiters. For example &amp; where amp is name.
The name refers to a predefined string of text and/or markup.
       Character References − These contain references, such as &#65;, contains a hash mark (“#”) followed by a number. The number always refers to the Unicode code of a character. In this case, 65 refers to alphabet "A".

XML Text

The names of XML-elements and XML-attributes are case-sensitive, which means the name of start and end elements need to be written in the same case. To avoid character encoding problems, all XML files should be saved as Unicode UTF-8 or UTF-16 files.
Whitespace characters like blanks, tabs and line-breaks between XMLelements and between the XML-attributes will be ignored.
Some characters are reserved by the XML syntax itself. Hence, they cannot be used directly. To use them, some replacement-entities are used, which are listed below −
Not Allowed Character
Replacement Entity
Character Description
<
&lt;
less than
>
&gt;
greater than
&
&amp;
ampersand
'
&apos;
apostrophe
"
&quot;
quotation mark

 

XML Tags

XML tags form the foundation of XML. They define the scope of an element in XML. They can also be used to insert comments, declare settings required for parsing the environment, and to insert special instructions.
We can broadly categorize XML tags as follows −

Start Tag

The beginning of every non-empty XML element is marked by a start-tag.
Following is an example of start-tag −
<address>

End Tag

Every element that has a start tag should end with an end-tag. Following is an example of end-tag −
</address>
Note, that the end tags include a solidus ("/") before the name of an element.

Empty Tag

The text that appears between start-tag and end-tag is called content. An element which has no content is termed as empty. An empty element can be represented in two ways as follows −
A start-tag immediately followed by an end-tag as shown below −
<hr></hr>
A complete empty-element tag is as shown below −
<hr />
Empty-element tags may be used for any element which has no content.


XML Tags Rules

Following are the rules that need to be followed to use XML tags −

Rule 1

XML tags are case-sensitive. Following line of code is an example of wrong syntax </Address>, because of the case difference in two tags, which is treated as erroneous syntax in XML.
<address>This is wrong syntax</Address>
Following code shows a correct way, where we use the same case to name the start and the end tag.
<address>This is correct syntax</address>

Rule 2

XML tags must be closed in an appropriate order, i.e., an XML tag opened inside another element must be closed before the outer element is closed. For example −
<outer_element>
   <internal_element>
      This tag is closed before the outer_element
   </internal_element>
</outer_element>


XML-ELEMENTS

XML elements can be defined as building blocks of an XML. Elements can behave as containers to hold text, elements, attributes, media objects or all of these.
Each XML document contains one or more elements, the scope of which are either delimited by start and end tags, or for empty elements, by an emptyelement tag.

Syntax

Following is the syntax to write an XML element −
<element-name attribute1 attribute2>
....content
</element-name>
where,
       element-name is the name of the element. The name its case in the start and end tags must match.
       attribute1, attribute2 are attributes of the element separated by white spaces. An attribute defines a property of the element. It associates a name with a value, which is a string of characters. An attribute is written as −
name = "value"
name is followed by an = sign and a string value inside double(" ") or single(' ') quotes.

Empty Element

An empty element (element with no content) has following syntax −
<name attribute1 attribute2.../>
Following is an example of an XML document using various XML element −
<?xml version = "1.0"?>
<contact-info>
   <address category = "residence">
      <name>Tanmay Patil</name>
      <company>TutorialsPoint</company>
      <phone>(011) 123-4567</phone>
   </address>
</contact-info>

XML Elements Rules

Following rules are required to be followed for XML elements −
       An element name can contain any alphanumeric characters. The only punctuation mark allowed in names are the hyphen (-), under-score (_) and period (.).
       Names are case sensitive. For example, Address, address, and ADDRESS are different names.
       Start and end tags of an element must be identical.
       An element, which is a container, can contain text or elements as seen in the above example.

ATTRIBUTES

Attributes are part of XML elements. An element can have multiple unique attributes. Attribute gives more information about XML elements. To be more precise, they define properties of elements. An XML attribute is always a name-value pair.

Syntax

An XML attribute has the following syntax −
<element-name attribute1 attribute2 >
....content..
< /element-name>
where attribute1 and attribute2 has the following form −
name = "value"
         value has    to    be    in    double    ("    ")    or    single    ('    ')       quotes.
Here, attribute1 and attribute2 are unique attribute labels.
Attributes are used to add a unique label to an element, place the label in a category, add a Boolean flag, or otherwise associate it with some string of data. Following example demonstrates the use of attributes −
<?xml version = "1.0" encoding = "UTF-8"?>
<!DOCTYPE garden [
   <!ELEMENT garden (plants)*>
   <!ELEMENT plants (#PCDATA)>
   <!ATTLIST plants category CDATA #REQUIRED>
]>

<garden>
   <plants category = "flowers" />
   <plants category = "shrubs">
   </plants>
</garden>
Attributes are used to distinguish among elements of the same name, when you do not want to create a new element for every situation. Hence, the use of an attribute can add a little more detail in differentiating two or more similar elements.
In the above example, we have categorized the plants by including attribute category and assigning different values to each of the elements. Hence, we have two categories of plants, one flowers and other color. Thus, we have two plant elements with different attributes.
You can also observe that we have declared this attribute at the beginning of XML.

Attribute Types

Following table lists the type of attributes −
Attribute Type
Description
StringType
It takes any literal string as a value. CDATA is a StringType. CDATA is character data. This means, any string of non-markup characters is a legal part of the attribute.
TokenizedType
This is a more constrained type. The validity constraints noted in the grammar are applied after the attribute value is normalized. The TokenizedType attributes are given as −
       ID − It is used to specify the element as unique.
       IDREF − It is used to reference an ID that has been named for another element.
       IDREFS − It is used to reference all IDs of an element.
       ENTITY − It indicates that the attribute will represent an external entity in the document.
       ENTITIES − It indicates that the attribute will represent external entities in the document.
       NMTOKEN − It is similar to CDATA with restrictions on what data can be part of the attribute.
       NMTOKENS − It is similar to CDATA with restrictions on what data can be part of the attribute.
EnumeratedType
This has a list of predefined values in its declaration. out of which, it must assign one value. There are two types of enumerated attribute −
       NotationType − It declares that an element will be referenced to a NOTATION declared somewhere else in the XML document.
       Enumeration − Enumeration allows you to define a specific list of values that the attribute value must match.

Element Attribute Rules

Following are the rules that need to be followed for attributes −
       An attribute name must not appear more than once in the same start-tag or empty-element tag.
       An attribute must be declared in the Document Type Definition (DTD) using an Attribute-List Declaration.
       Attribute values must not contain direct or indirect entity references to external entities.
       The replacement text of any entity referred to directly or indirectly in an attribute value must not contain a less than sign (<)

COMMENTS

XML comments are similar to HTML comments. The comments are added as notes or lines for understanding the purpose of an XML code.
Comments can be used to include related links, information, and terms. They are visible only in the source code; not in the XML code. Comments may appear anywhere in XML code.

Syntax

XML comment has the following syntax −
<!--Your comment-->
A comment starts with <!-- and ends with -->. You can add textual notes as comments between the characters. You must not nest one comment inside the other.

Example

Following example demonstrates the use of comments in XML document −
<?xml version = "1.0" encoding = "UTF-8" ?>
<!--Students grades are uploaded by months-->
<class_list>
   <student>
      <name>Tanmay</name>
      <grade>A</grade>
   </student>
</class_list>
Any text between <!-- and --> characters is considered as a comment.

XML Comments Rules

Following rules should be followed for XML comments −
        Comments cannot appear before XML declaration.
        Comments may appear anywhere in a document.
        Comments must not appear within attribute values.
        Comments cannot be nested inside the other comments.

CDATA

XML CDATA section. The term CDATA means, Character Data. CDATA is defined as blocks of text that are not parsed by the parser, but are otherwise recognized as markup.
The predefined entities such as &lt;, &gt;, and &amp; require typing and are generally difficult to read in the markup. In such cases, CDATA section can be used. By using CDATA section, you are commanding the parser that the particular section of the document contains no markup and should be treated as regular text.

Syntax

Following is the syntax for CDATA section −
<![CDATA[    characters with markup
]]>
The above syntax is composed of three sections −
       CDATA    Start section −     CDATA          begins      with    the     nine-character delimiter <![CDATA[
       CDATA End section − CDATA section ends with ]]> delimiter.
       CData section − Characters between these two enclosures are interpreted as characters, and not as markup. This section may contain markup characters (<, >, and &), but they are ignored by the XML processor.

Example

The following markup code shows an example of CDATA. Here, each character written inside the CDATA section is ignored by the parser.
<script>
   <![CDATA[
      <message> Welcome to TutorialsPoint </message>
   ]] >
</script >
In the above syntax, everything between <message> and </message> is treated as character data and not as markup.

CDATA Rules

The given rules are required to be followed for XML CDATA −
        CDATA cannot contain the string "]]>" anywhere in the XML document.
        Nesting is not allowed in CDATA section.


VALIDATION

Validation is a process by which an XML document is validated. An XML document is said to be valid if its contents match with the elements, attributes and associated document type declaration(DTD), and if the document complies with the constraints expressed in it. Validation is dealt in two ways by the XML parser. They are −
        Well-formed XML document
        Valid XML document

Well-formed XML Document

An XML document is said to be well-formed if it adheres to the following rules −
       Non DTD XML files must use the predefined character entities for amp(&)apos(single quote)gt(>)lt(<)quot(double quote).
       It must follow the ordering of the tag. i.e., the inner tag must be closed before closing the outer tag.
       Each of its opening tags must have a closing tag or it must be a self ending tag.(<title>....</title> or <title/>).
       It must have only one attribute in a start tag, which needs to be quoted.
       amp(&)apos(single quote)gt(>)lt(<)quot(double     quote)entities other than these must be declared.

Example

Following is an example of a well-formed XML document −
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
<!DOCTYPE address
[
   <!ELEMENT address (name,company,phone)>
   <!ELEMENT name (#PCDATA)>
   <!ELEMENT company (#PCDATA)>
   <!ELEMENT phone (#PCDATA)>
]>

<address>
   <name>Tanmay Patil</name>
   <company>TutorialsPoint</company>
   <phone>(011) 123-4567</phone>
</address>
The above example is said to be well-formed as −
       It defines the type of document. Here, the document type is element type.
       It includes a root element named as address.
       Each of the child elements among name, company and phone is enclosed in its self explanatory tag.
       Order of the tags is maintained.

Valid XML Document

If an XML document is well-formed and has an associated Document Type Declaration (DTD), then it is said to be a valid XML document. We will study more about DTD in the chapter XML DTDs

DTD

The XML Document Type Declaration, commonly-known as DTD, is a way to describe XML language precisely. DTDs check vocabulary and validity of the structure of XML documents against grammatical rules of appropriate XML language.
An XML DTD can be either specified inside the document or it can be kept in a separate document and then liked separately.

Syntax

Basic syntax of a DTD is as follows −
<!DOCTYPE element DTD identifier
[    declaration1
   declaration2
   ........
]>
In the above syntax,
       The DTD starts with <!DOCTYPE delimiter.
       An element tells the parser to parse the document from the specified root element.
       DTD identifier is an identifier for the document type definition, which may be the path to a file on the system or URL to a file on the internet. If the DTD is pointing to external path, it is called External Subset.
       The square brackets [ ] enclose an optional list of entity declarations called Internal Subset.

Internal DTD

A DTD is referred to as an internal DTD if elements are declared within the XML files. To refer it as internal DTD, standalone attribute in XML declaration must be set to yes. This means the declaration works independently of an external source.

Syntax

Following is the syntax of internal DTD −
<!DOCTYPE root-element [element-declarations]>
where root-element is      the    name         of      root     element      and element declarations is where you declare the elements.

Example

Following is a simple example of internal DTD −
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
<!DOCTYPE address [
   <!ELEMENT address (name,company,phone)>
   <!ELEMENT name (#PCDATA)>
   <!ELEMENT company (#PCDATA)>
   <!ELEMENT phone (#PCDATA)>
]>

<address>
   <name>Tanmay Patil</name>
   <company>TutorialsPoint</company>
   <phone>(011) 123-4567</phone>
</address>
Let us go through the above code −
Start Declaration − Begin the XML declaration with the following statement.
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
DTD − Immediately after the XML header, the document type declarationfollows, commonly referred to as the DOCTYPE −
<!DOCTYPE address [
The DOCTYPE declaration has an exclamation mark (!) at the start of the element name. The DOCTYPE informs the parser that a DTD is associated with this XML document.
DTD Body − The DOCTYPE declaration is followed by body of the DTD, where you declare elements, attributes, entities, and notations.
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone_no (#PCDATA)>
Several elements are declared here that make up the vocabulary of the <name> document. <!ELEMENT name (#PCDATA)> defines the element nameto be of type "#PCDATA". Here #PCDATA means parse-able text data.
End Declaration − Finally, the declaration section of the DTD is closed using a closing bracket and a closing angle bracket (]>). This effectively ends the definition, and thereafter, the XML document follows immediately.

Rules

       The document type declaration must appear at the start of the document (preceded only by the XML header) − it is not permitted anywhere else within the document.
       Similar to the DOCTYPE declaration, the element declarations must start with an exclamation mark.
       The Name in the document type declaration must match the element type of the root element.

External DTD

In external DTD elements are declared outside the XML file. They are accessed by specifying the system attributes which may be either the legal .dtd file or a valid URL. To refer it as external
DTD, standalone attribute in the XML declaration must be set as no. This means, declaration includes information from the external source.

Syntax

Following is the syntax for external DTD −
<!DOCTYPE root-element SYSTEM "file-name">
where file-name is the file with .dtd extension.

Example

The following example shows external DTD usage −
<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
<!DOCTYPE address SYSTEM "address.dtd">
<address>
   <name>Tanmay Patil</name>
   <company>TutorialsPoint</company>
   <phone>(011) 123-4567</phone>
</address>
The content of the DTD file address.dtd is as shown −
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>

Types

You       can    refer to      an     external     DTD by using either system identifiers or public identifiersSystem Identifiers
A system identifier enables you to specify the location of an external file containing DTD declarations. Syntax is as follows −
<!DOCTYPE name SYSTEM "address.dtd" [...]>
As you can see, it contains keyword SYSTEM and a URI reference pointing to the location of the document. Public Identifiers
Public identifiers provide a mechanism to locate DTD resources and is written as follows −
<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address Example//EN">
As you can see, it begins with keyword PUBLIC, followed by a specialized identifier. Public identifiers are used to identify an entry in a catalog. Public identifiers can follow any format, however, a commonly used format is called Formal Public Identifiers, or FPIs.

TREE STRUCTURE
An XML document is always descriptive. The tree structure is often referred to as XML Tree and plays an important role to describe any XML document easily.
The tree structure contains root (parent) elements, child elements and so on. By using tree structure, you can get to know all succeeding branches and sub-branches starting from the root. The parsing starts at the root, then moves down the first branch to an element, take the first branch from there, and so on to the leaf nodes.

Example

Following example demonstrates simple XML tree structure −
<?xml version = "1.0"?>
<Company>
   <Employee>
      <FirstName>Tanmay</FirstName>
      <LastName>Patil</LastName>
      <ContactNo>1234567890</ContactNo>
      <Email>tanmaypatil@xyz.com</Email>
      <Address>
         <City>Bangalore</City>
         <State>Karnataka</State>
         <Zip>560212</Zip>
      </Address>
   </Employee>
</Company>
Following tree structure represents the above XML document −

In the above diagram, there is a root element named as <company>. Inside that, there is one more element <Employee>. Inside the employee element, there are five branches named <FirstName>, <LastName>, <ContactNo>, <Email>, and <Address>. Inside the <Address> element, there are three sub-branches, named <City> <State> and <Zip>.












NAMESPACE
XML Namespaces provide a method to avoid element name conflicts.

Name Conflicts

In XML, element names are defined by the developer. This often results in a conflict when trying to mix XML documents from different XML applications.
This XML carries HTML table information:
<table>
  <tr>
    <td>Apples</td>
    <td>Bananas</td>
  </tr>
</table>
This XML carries information about a table (a piece of furniture):
<table>
  <name>African Coffee Table</name>
  <width>80</width>
  <length>120</length> </table>
If these XML fragments were added together, there would be a name conflict. Both contain a <table> element, but the elements have different content and meaning.
A user or an XML application will not know how to handle these differences.

Solving the Name Conflict Using a Prefix

Name conflicts in XML can easily be avoided using a name prefix.
This XML carries information about an HTML table, and a piece of furniture:
<h:table>
  <h:tr>
    <h:td>Apples</h:td>
    <h:td>Bananas</h:td>
  </h:tr>
</h:table>

<f:table>
  <f:name>African Coffee Table</f:name>
  <f:width>80</f:width>
  <f:length>120</f:length>
</f:table>
In the example above, there will be no conflict because the two <table> elements have different names.


XML Namespaces - The xmlns Attribute

When using prefixes in XML, a namespace for the prefix must be defined.
The namespace can be defined by an xmlns attribute in the start tag of an element.
The namespace declaration has the following syntax. xmlns:prefix="URI".
<root>

<h:table xmlns:h="http://www.w3.org/TR/html4/">
  <h:tr>
    <h:td>Apples</h:td>
    <h:td>Bananas</h:td>
  </h:tr>
</h:table>

<f:table xmlns:f="https://www.w3schools.com/furniture">
  <f:name>African Coffee Table</f:name>
  <f:width>80</f:width>
  <f:length>120</f:length>
</f:table>

</root>
In the example above:
The xmlns attribute in the first <table> element gives the h: prefix a qualified namespace.
The xmlns attribute in the second <table> element gives the f: prefix a qualified namespace.
When a namespace is defined for an element, all child elements with the same prefix are associated with the same namespace.
Namespaces can also be declared in the XML root element:
<root xmlns:h="http://www.w3.org/TR/html4/" xmlns:f="https://www.w3schools.com/furniture">

<h:table>
  <h:tr>
    <h:td>Apples</h:td>
    <h:td>Bananas</h:td>
  </h:tr>
</h:table>

<f:table>
  <f:name>African Coffee Table</f:name>
  <f:width>80</f:width>
  <f:length>120</f:length>
</f:table>

</root>
Note: The namespace URI is not used by the parser to look up information.
The purpose of using an URI is to give the namespace a unique name.
However, companies often use the namespace as a pointer to a web page containing namespace information.

Uniform Resource Identifier (URI)

Uniform Resource Identifier (URI) is a string of characters which identifies an Internet Resource.
The most common URI is the Uniform Resource Locator (URL) which identifies an Internet domain address. Another, not so common type of URI is the Uniform Resource Name (URN).

Default Namespaces

Defining a default namespace for an element saves us from using prefixes in all the child elements. It has the following syntax:
xmlns="namespaceURI"
This XML carries HTML table information:
<table xmlns="http://www.w3.org/TR/html4/">
  <tr>
    <td>Apples</td>
    <td>Bananas</td>
  </tr>
</table>
This XML carries information about a piece of furniture:
<table xmlns="https://www.w3schools.com/furniture">
  <name>African Coffee Table</name>
  <width>80</width>
  <length>120</length>
</table>

Namespaces in Real Use

XSLT is a language that can be used to transform XML documents into other formats.
The XML document below, is a document used to transform XML into HTML.
The namespace "http://www.w3.org/1999/XSL/Transform" identifies XSLT elements inside an HTML document:

<?xml version="1.0" encoding="UTF-8"?>


<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Tr ansform">

<xsl:template match="/">
<html>
<body>
  <h2>My CD Collection</h2>
  <table border="1">
    <tr>
      <th style="text-align:left">Title</th>       <th style="text-align:left">Artist</th>
    </tr>
    <xsl:for-each select="catalog/cd">
    <tr>
      <td><xsl:value-of select="title"/></td>       <td><xsl:value-of select="artist"/></td>
    </tr>
    </xsl:for-each>
  </table>
</body>
</html>
</xsl:template>

</xsl:stylesheet>


XPATH

XPath can be used to navigate through elements and attributes in an XML document.
        XPath is a syntax for defining parts of an XML document.
        XPath uses path expressions to navigate in XMLdocuments.
        XPath contains a library of standard functions.
        XPath is a major element in XSLT and in XQuery.
        XPath is a W3C recommendation.

XPath uses path expressions to select nodes or node-sets in an XML document. The node is selected by following a path or steps.

Selecting Nodes

XPath uses path expressions to select nodes in an XML document. The node is selected by following a path or steps. The most useful path expressions are listed below:

Expression
Description
nodename
Selects all nodes with the name "nodename"
/
Selects from the root node
//
Selects nodes in the document from the current node that match the selection no matter where they are
.
Selects the current node
..
Selects the parent of the current node
@
Selects attributes
In the table below we have listed some path expressions and the result of the expressions:
Path Expression
Result
bookstore
Selects all nodes with the name "bookstore"
/bookstore
Selects the root element bookstore
Note: If the path starts with a slash ( / ) it always represents an absolute path to an element!
bookstore/book
Selects all book elements that are children of bookstore
//book
Selects all book elements no matter where they are in the document
bookstore//book
Selects all book elements that are descendant of the bookstore element, no matter where they are under the bookstore element
//@lang
Selects all attributes that are named lang

Predicates

Predicates are used to find a specific node or a node that contains a specific value.
Predicates are always embedded in square brackets.
In the table below we have listed some path expressions with predicates and the result of the expressions:
Path Expression

Result
/bookstore/book[1]

Selects the first book element that is the child of the bookstore element.
Note: In IE 5,6,7,8,9 first node is[0], but according to W3C, it is [1]. To solve this problem in IE, set the SelectionLanguage to XPath:
In JavaScript:
xml.setProperty("SelectionLanguage","XPath");
/bookstore/book[last()]                   Selects the last book element that is the child of the bookstore element
/bookstore/book[last()-1]                Selects the last but one book element that is the child of the bookstore element
/bookstore/book[position()<3]         Selects the first two book elements that are children of the bookstore element
//title[@lang]                                 Selects all the title elements that have an attribute named lang
//title[@lang='en']                          Selects all the title elements that have a "lang" attribute with a value of "en"
/bookstore/book[price>35.00]         Selects all the book elements of the bookstore element that have a price element with a value greater than 35.00
/bookstore/book[price>35.00]/title Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00

Selecting Unknown Nodes

XPath wildcards can be used to select unknown XML nodes.
Wildcard
Description
*
Matches any element node
@*
Matches any attribute node
node()
Matches any node of any kind
In the table below we have listed some path expressions and the result of the expressions:
Path Expression
Result
/bookstore/*
Selects all the child element nodes of the bookstore element
//*
Selects all elements in the document
//title[@*]
Selects all title elements which have at least one attribute of any kind

Selecting Several Paths

By using the | operator in an XPath expression you can select several paths.
Path Expression
Result
//book/title | //book/price
Selects all the title AND price elements of all book elements
//title | //price
Selects all the title AND price elements in the document
/bookstore/book/title | //price
Selects all the title elements of the book element of the bookstore element AND all the price elements in the document
In the table below we have listed some path expressions and the result of the expressions:

No comments:

Post a Comment