Tip: Convert from HTML to XML with HTML Tidy
By Benoit Marchal
2003-12-16
Reader Rating:

Preserve Legacy Web Sites With This Handy Utility
Level: Introductory
This tip demonstrates how to convert HTML documents to XML (or more specifically, XHTML) with a simple, open source tool, HTML Tidy. This conversion is useful for webmasters who are migrating to XML. It can also help XML converts who have to interface with legacy HTML tools.
One the challenges that webmasters face when converting from pure HTML to XML/XSL is the preservation of their legacy Web sites. Because it would be too costly to dump the old site and start again from scratch, some sort of automated procedure that brings the HTML site to XML is required.
Even XML converts have to deal with HTML files: Many products have added an option for exporting HTML documents -- an option you might want to integrate into your Web site.
This tip discusses HTML Tidy, a powerful tool to help convert old HTML pages to newer standards, such as XML. Tidy is distributed as open source.
First published by IBM developerWorks
If you found this article interesting, you may want to read these as well:
» Better SOAP Interfaces With Header Elements
» Variable Substitution In XML Documents
» Create JPEGs Automatically With SVG
» Grab Headlines From A Remote RSS File
|