.
Developer Spot - Web Development Tutorials
arrowDeverloper Spot  Tutorials  XML  Tip: Convert from HTML to XML with HTML Tidy 
 
Development Tutorials
ASP
CGI & Perl
CSS
HTML
Java
JavaScript
Linux
PHP
XML




More Resources
Web Hosting Articles
Web Development News
PHP Manual
Web Hosting Directory
Budget Web Hosting Linux Web Hosting Small Business Hosting
Windows Web Hosting Reseller Web Hosting Web Hosting Articles

Tip: Convert from HTML to XML with HTML Tidy

By Benoit Marchal
2003-12-16
Reader Rating: 5 out of 5
Bookmark Print Version
Tool Of The Trade

The basic tool you can use to upgrade a site from HTML to XML is HTML Tidy. Originally developed by Dave Raggett and distributed under an open source license through the W3C Web site, HTML Tidy is now maintained by a group of volunteers at SourceForge. A Java-language version (aptly called JTidy) is also available (see Resources). Last but not least, an API allows you to integrate HTML Tidy as a library in your applications.

HTML and XML are both markup languages derived from SGML, so they have a lot in common. Still, there are two major differences:

XML syntax is far more restrictive; most importantly, in XML you must remember to close the tags.
HTML coding often has been relatively careless, so the files are rarely trouble-free to start with.
Early Web browsers encouraged sloppiness among webmasters by being extraordinarily tolerant of errors. At the time, the goal of these browsers was to get as many people on board as possible and to encourage webmasters to publish documents. The strategy worked, and Web content grew exponentially.

Still, poor coding practices caused all kind of incompatibilities, and HTML Tidy was originally designed to address this. It rewrites HTML pages to be conformant with the latest W3C standards. In the process, it fixes many common errors such as unclosed tags.

Although HTML Tidy primarily works with HTML pages, it also supports XHTML, an XML vocabulary.

As an example, I will work with a photo gallery generated with Photoshop. You can use other HTML documents, but if you'd like to experiment with the same files I use, the gallery is also available for download in the Resources section. Listing 1 is an excerpt from the gallery -- as you can see, it's plain HTML code.


Article Pages:
Preserve Legacy Web Sites With This Handy Utility
Tool Of The Trade
Listing 1. index.html (an excerpt)
Tidying Up
Listing 2. index.xml (an excerpt)
Further Processing
Listing 3. index-transform.xml (an excerpt)
Listing 4. cleanup.xsl
Conclusion

First published by IBM developerWorks


 Rate this article:   Poor          Excellent 


If you found this article interesting, you may want to read these as well:

» Better SOAP Interfaces With Header Elements

» Variable Substitution In XML Documents

» Create JPEGs Automatically With SVG

» Grab Headlines From A Remote RSS File



 
Development Tutorials: CGI & Perl - CSS - HTML - Java - JavaScript - Linux - PHP - XML
More Resources: Web Hosting Articles - Web Development News - PHP Manual