Validating XML Documents Java


Validating XML

Validating XML documents.

Before parsing  XML document in java or any other language program, we can check for the validity of the XML file. If we could check for validity and proper structure of the XML document, then it is very efficient to read XML documents. Therefor Validating XML is very important part in programming.

Need of validation:

In maximum cases We are unaware about the depth, type of nodes or type of element contents of a XML document. In such cases we have to check each node before reading for the information. It makes code lengthy and complex. So if we do Validating XML documents, we can reduce lot of burden from our code. Without use of validation, its quite a bit tedious to code and debug.

How to validate:

To specify the document structure, we have to supply a DTD or XML schema definition. DTD or schema contains rules about how document should formed or we can say how the elements of document are organized and defined.

Mainly there are two ways to supply document structure. These are

  • By using DTD
  • By using schema definition language

In this article I Will focus on defining a simple DTD to define structure of the document used in previous example.

Actually these are part of XML and has no significant relation with java programming language. I’m going to give simple demo example to illustrate the process of Validating XML. And I’m pretty sure that after reading this article you would be clear about DTD.

What is DTD:

DTD stands for Document type definition. DTD is text based document with .dtd extension It is used to define the structure of XML document. We can supply DTD in separate file as well as in same XML file. If we supply DTD in same XML file, we must write DTD rules with in DOCTYPE element.

Syntax basics for writing DTD’s

DTD contains Element, attribute and entity references declaration. In DTD’s, to declare any component we use  declaration syntax <!>. <!> is called declaration component.

Syntax for declaring the element is.

<!ELEMENT element-name (content model)>

Ex.

<!ELEMENT info (name,job,qualification*)>
 <ELEMENT name (#PCDATA)>

Here content model contains name of child elements or the type of data in the element.

In the first line of example info is the name of element and all three items inside parenthesis are child elements. This means info element must have a child with name name, a child wit name job and 0 or more child elements with name qualification.

#PCDATA in the content-model part of declaration force element to allow only a text data.

PCDATA stands for Parsed character Data.

In content model 5 types of elements declaration are possible.

  • Text Only elements (#PCDATA)
  • Child only elements(child1, child2,…, childn)
  • Empty elements(
  • Any elements
  • Mixed elements

Link dtd with xml document

In xml document we should use doctype declaration. there are two Types of DTD these are Internal or external DTD.

In internal DTD we have to pass DTD rules inside DOCTYPE element. This DTD is only valid for only one xml file. Syntax of internal DTD is like this.

<DOCTYPE rootEleemnts [DTD rules]>

For Example I want to show the xml file with DTD within it.

Internal DTD Example:

myinfo.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE info[
	<!ELEMENT info (name,job,qualification*)>
	<!ELEMENT name (#PCDATA)>
	<!ELEMENT job (#PCDATA)>
	<!ELEMENT qualification (#PCDATA)>	]>
<info>
	<name>
		Log Raj Bhatt
	</name>	
	<job>
		Blogger
	</job>
	
	<qualification>
		Metric
	</qualification>

	<qualification>
		HatTric
	</qualification>
	
</info>

Now in above example user cannot change anything violating the DTD rules.

 

Similarly in external DTD’s there are two files one is DTD file and another is XML file. W have to link DTD with XML file. Here we can use a single DTD to validate multiple xml files.

Example of External DTD:

infoFormat.dtd

<!ELEMENT info (name,job,qualification*)>
	<!ELEMENT name (#PCDATA)>
	<!ELEMENT job (#PCDATA)>
	<!ELEMENT qualification (#PCDATA)>

myinfo.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE info SYSTEM "infoFormat.dtd">
<info>
	<name>
		Log Raj Bhatt
	</name>	
	<job>
		Blogger
	</job>
	
	<qualification>
		Metric
	</qualification>
	<qualification>
		HatTric
	</qualification>
	
</info>

DOCTYPE element is for mapping DTD with XML file. There are two types of external DTD these are public and private DTD’s. SYSTEM inside DOCTYPE represents that this is private DTD. info is the root element name of the XML file.

Validating XML

You can use this  file in the java program same as in previous article to read content.

package xml.main;

import java.io.File;
import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.*;
import org.xml.sax.SAXException;

public class XmlMain {

	public static void main(String[] args)throws Exception {
	        DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();
		factory.setIgnoringElementContentWhitespace(true);
		DocumentBuilder db=factory.newDocumentBuilder();
		File f=new File("xmls\\myinfo.xml");
		Document doc=db.parse(f);
		Element root=doc.getDocumentElement();
		System.out.println(root.getNodeName());
		for(Node node=root.getFirstChild();node!=null;node=node.getNextSibling()) {
			String name=node.getNodeName();
			String value=node.getTextContent().trim();
			System.out.println(name +" --> "+value);
						
		}			
	}
}

Output of this program my look like:

OUtput:

info
name –> Log Raj Bhatt
qualification –> Metric
qualification –> HatTric

Output Validating XML

 

You may say what is advantage of using XML validation. We all know that XML is mostly used to transport data and input configuration data. Most importantly XML files are used to give input to the program at run-time. You may have used xml files like web.xml in web development or in multiple frameworks also. In such cases user can change the XML file, which may cause error in execution.

I will explain about using schema definition language in next article. Let me know if you have any confusion in above content. There may be chance of error or mistake in above information Please let us know about that. Happy learning!!

 

 

Have any Question or Comment?

Leave a Reply

Your email address will not be published. Required fields are marked *