HTML to XML Conversion Sample #1
Goto Sample #2
Goto Sample #3
Goto Sample #4
This is the first of several examples describing the details of how the Chilkat HTML-to-XML library converts HTML into well-formed XML.
We'll begin with the following HTML and the describe the features of the generated XML:
<html>
<head>
<title>This is a test</title>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
</head>
<body>
<h1>This is the heading</h1>
<p>Lorem ipsum dolor sit amet, <b>consectetur</b> adipisicing elit, sed do eiusmod tempor incididunt ut labore <br> et dolore magna aliqua.
<p>Ut enim ad minim veniam, <a href="http://www.google.com/">quis nostrud exercitation</a> ullamco laboris nisi ut aliquip ex ea commodo consequat.
</body>
</html>
The XML output is shown below.
- The XML is written to match the encoding of the HTML. In the HTML above, the charset is windows-1252, so the encoding attribute is set to windows-1252.
- The root node of the XML document is always <root>. The <html> node is found directly underneath. The reason for the "root" node is because you may encounter poorly formed HTML such that it has more than one root-level node.
- All text content is placed under <text> nodes.
<?xml version="1.0" encoding="windows-1252" ?>
<root>
<html>
<head>
<title>
<text>This is a test</text>
</title>
<meta http-equiv="Content-Language" content="en-us"></meta>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252"></meta>
</head>
<body>
<h1>
<text>This is the heading</text>
</h1>
<p>
<text>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
</text>
</p>
<p>
<text>Ut enim ad minim veniam, </text>
<a href="http://www.google.com/">
<text>quis nostrud exercitation</text>
</a>
<text>ullamco laboris nisi ut aliquip ex ea commodo consequat.
</text>
</p>
</body>
</html>
</root>
(The Chilkat HTML-to-XML API is offered across many programming languages: Ruby, Perl, Python, Java, C#, VB.NET, etc.)