Zip Component, Email Component, Encryption Component ActiveX Control for Zip Compression .NET Components for ASP.NET
ActiveX and .NET Components for Zip Compression, Encryption, Email, XML, S/MIME, HTML Email, Character Encoding, Digital Certificates, FTP, and more ASP Email ActiveX Component

  

  

  Chilkat ActiveX Components

  Chilkat .NET Components

  Chilkat C++ Libraries

  

  

  

  

 

FAQ

HTML to XML Conversion Sample #3

Goto Sample #1

Goto Sample #2

Goto Sample #4

This is the 3rd of several examples describing the details of how the Chilkat HTML-to-XML library converts HTML into well-formed XML.

Here is another HTML sample. You'll notice that this one contains several errors, which are automatically corrected by the HTML-to-XML library:

<html>
<head>
<title>This is a test</title>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
</head>
<body>
<table>
<tr>
<td>Row 1, column 1</td>
<td>Row 1, column 2</td>
<td>Row 1, column 3 Oops forgot the ending td
</tr>
<tr>
<td>Row 2, column 1 Oops...</abc>
<td>Row 2, column 2</td>
<td>Row 2, column 3</td>
</tr>
<tr>
<td>Row 2, column 1 Oops...</abc>
<td>Row 2, <div> This is a test </div> column 2</td>
<td>Row 2, column 3</td>
<!-- Oops, forgot to close the last tr -->
</table>

</body>
</html>

The XML output is shown below.

  • The XML below is well-formed and the HTML errors have been corrected.
  • HTML comments are saved within <comment> nodes.
  • All text content is placed under <text> nodes.
<?xml version="1.0" encoding="windows-1252" ?>

<root>
    <html>
        <head>
            <title>
                <text>This is a test</text>
            </title>
            <meta http-equiv="Content-Language" content="en-us"></meta>
            <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"></meta>
        </head>
        <body>
            <table>
                <tr>
                    <td>
                        <text>Row 1, column 1</text>
                    </td>
                    <td>
                        <text>Row 1, column 2</text>
                    </td>
                    <td>
                        <text>Row 1, column 3 Oops forgot the ending td
                        </text>
                    </td>
                    <tr>
                        <td>
                            <text>Row 2, column 1 Oops...</text>
                        </td>
                        <td>
                            <text>Row 2, column 2</text>
                        </td>
                        <td>
                            <text>Row 2, column 3</text>
                        </td>
                    </tr>
                    <tr>
                        <td>
                            <text>Row 2, column 1 Oops...</text>
                        </td>
                        <td>
                            <text>Row 2, </text>
                            <div>
                                <text>This is a test </text>
                            </div>
                            <text>column 2</text>
                        </td>
                        <td>
                            <text>Row 2, column 3</text>
                        </td>
                        <comment>Oops, forgot to close the last tr</comment>
                    </tr>
                </tr>
            </table>
        </body>
    </html>
</root>

(The Chilkat HTML-to-XML API is offered across many programming languages: Ruby, Perl, Python, Java, C#, VB.NET, etc.)


Privacy Statement. Copyright 2000-2010 Chilkat Software, Inc. All rights reserved.
Send feedback to support@chilkatsoft.com

Components for Microsoft Windows 7, Vista, XP, 2000, 2003 Server, and Windows 95/98/NT4.