HTML to XML Conversion Sample #3
Goto Sample #1
Goto Sample #2
Goto Sample #4
This is the 3rd of several examples describing the details of how the Chilkat HTML-to-XML library converts HTML into well-formed XML.
Here is another HTML sample. You'll notice that this one contains several errors, which are automatically corrected by the HTML-to-XML library:
<html>
<head>
<title>This is a test</title>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
</head>
<body>
<table>
<tr>
<td>Row 1, column 1</td>
<td>Row 1, column 2</td>
<td>Row 1, column 3 Oops forgot the ending td
</tr>
<tr>
<td>Row 2, column 1 Oops...</abc>
<td>Row 2, column 2</td>
<td>Row 2, column 3</td>
</tr>
<tr>
<td>Row 2, column 1 Oops...</abc>
<td>Row 2, <div> This is a test </div> column 2</td>
<td>Row 2, column 3</td>
<!-- Oops, forgot to close the last tr -->
</table>
</body>
</html>
The XML output is shown below.
- The XML below is well-formed and the HTML errors have been corrected.
- HTML comments are saved within <comment> nodes.
- All text content is placed under <text> nodes.
<?xml version="1.0" encoding="windows-1252" ?>
<root>
<html>
<head>
<title>
<text>This is a test</text>
</title>
<meta http-equiv="Content-Language" content="en-us"></meta>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252"></meta>
</head>
<body>
<table>
<tr>
<td>
<text>Row 1, column 1</text>
</td>
<td>
<text>Row 1, column 2</text>
</td>
<td>
<text>Row 1, column 3 Oops forgot the ending td
</text>
</td>
<tr>
<td>
<text>Row 2, column 1 Oops...</text>
</td>
<td>
<text>Row 2, column 2</text>
</td>
<td>
<text>Row 2, column 3</text>
</td>
</tr>
<tr>
<td>
<text>Row 2, column 1 Oops...</text>
</td>
<td>
<text>Row 2, </text>
<div>
<text>This is a test </text>
</div>
<text>column 2</text>
</td>
<td>
<text>Row 2, column 3</text>
</td>
<comment>Oops, forgot to close the last tr</comment>
</tr>
</tr>
</table>
</body>
</html>
</root>
(The Chilkat HTML-to-XML API is offered across many programming languages: Ruby, Perl, Python, Java, C#, VB.NET, etc.)