.
Developer Spot - Web Development Tutorials
 


Web Hosting Directory
Budget Web Hosting Linux Web Hosting Small Business Hosting
Windows Web Hosting Reseller Web Hosting Web Hosting Articles

Tip: Convert from HTML to XML with HTML Tidy

By Benoit Marchal
2003-12-16
Reader Rating: 5 out of 5
Bookmark Print Version
Preserve Legacy Web Sites With This Handy Utility

Level: Introductory

This tip demonstrates how to convert HTML documents to XML (or more specifically, XHTML) with a simple, open source tool, HTML Tidy. This conversion is useful for webmasters who are migrating to XML. It can also help XML converts who have to interface with legacy HTML tools.
One the challenges that webmasters face when converting from pure HTML to XML/XSL is the preservation of their legacy Web sites. Because it would be too costly to dump the old site and start again from scratch, some sort of automated procedure that brings the HTML site to XML is required.

Even XML converts have to deal with HTML files: Many products have added an option for exporting HTML documents -- an option you might want to integrate into your Web site.

This tip discusses HTML Tidy, a powerful tool to help convert old HTML pages to newer standards, such as XML. Tidy is distributed as open source.



Article Pages:
Preserve Legacy Web Sites With This Handy Utility
Tool Of The Trade
Listing 1. index.html (an excerpt)
Tidying Up
Listing 2. index.xml (an excerpt)
Further Processing
Listing 3. index-transform.xml (an excerpt)
Listing 4. cleanup.xsl
Conclusion

First published by IBM developerWorks


 Rate this article:   Poor          Excellent 


If you found this article interesting, you may want to read these as well:

» Better SOAP Interfaces With Header Elements

» Variable Substitution In XML Documents

» Create JPEGs Automatically With SVG

» Grab Headlines From A Remote RSS File



 
Development Tutorials
ASP
CGI & Perl
CSS
HTML
Java
JavaScript
Linux
PHP
XML




More Resources
Web Hosting Articles
Development Tutorials: CGI & Perl - CSS - HTML - Java - JavaScript - Linux - PHP - XML