Wednesday, October 29, 2008

XML: what's it an how it different from HTML?

XML - what's it an how it different from HTML?

XML - Extensible Markup Language

XML is a system and hardware independent language used for producing Unicode text files called XML Documents which define data and describe the structure of the contained data. The World Wide Web Consortium (also known as W3C) owns and controls the specifications of this language.

Since XML Documents contain data so they can be used for transporting data from one system to another. Additionally, XML Doc describes the structure of the contained data as well so the receiving system can easily interpret the contained data. This makes XML a standard in data communication between systems (homogeneous or heterogeneous).


Both the languages use tags and attributes and hence they may look similar, but they are vastly different in terms of what they are used for and what they are capable of. XML is primarily used for communication between two systems and hence concentrates more on the structure of the data whereas HTML is primarily used for the presentation of the data and hence concentrates more on the appearance of the data.

Another obvious difference is that the HTML tags are pre-defined, fixed in number and each of them have a specific meaning attached to it whereas XML tags don't have a fixed meaning attached to them and the name of the tags are also not pre-defined.

XML Document Structure

An XML document structure is very simple and it consists of two parts:-
  • Prolog: it is made up of two components - XML Declaration (which may contain the actual Unicode encoding scheme and it may also specify if the XML Doc is a Standalone doc or not) and DTD Declaration. A Document Type Definition (DTD) is used to identify the markup elements used in the XML body. The prolog is an optional part, but it's normally good to have one for a XML Doc.
  • Document Body - this part of an XML doc contains the actual data and its structure definition. It always contains a single Root Element which may contain any number of sub-elements within it.
Both the parts - prolog and document body may be followed by Processing Instructions. As the name suggests they are instructions used by the applications to process the XML Doc in a particular way as specified by the instructions.

Like any other language, one can have comments embedded in a XML Doc which are mainly meant to explain the data or to provide any additional details to the human readers of the particular XML Doc.

Well-formed XML Doc vs Valid XML Doc

A Well-formed XML Doc means the document has been written as per the XML Specifications. For example: the Prolog should contain the XML version, Encoding Scheme and Standalone info in a proper order, the Elements should be properly nested, there should be only one root element, etc.

A Valid XML Doc is a well-formed XML Doc which complies with the associated DTD as well. That means a Valid XML Doc is formed only of the elements which have already been defined in the referenced DTD document and the DTD document should also be written as per the specifications.

Liked the article? You may like to Subscribe to this blog for regular updates. You may also like to follow the blog to manage the bookmark easily and to tell the world that you enjoy GeekExplains. You can find the 'Followers' widget in the rightmost sidebar.


1 comment:

Anonymous said...

There is definately a great deal to know about this issue.
I love all of the points you've made.
My page: getting rid of garlic breath