Zip Component, Email Component, Encryption Component ActiveX Control for Zip Compression .NET Components for ASP.NET
ActiveX and .NET Components for Zip Compression, Encryption, Email, XML, S/MIME, HTML Email, Character Encoding, Digital Certificates, FTP, and more ASP Email ActiveX Component

  

  

  Chilkat ActiveX Components

  Chilkat .NET Components

  Chilkat C++ Libraries

  

  

  

  

 

FAQ

HTML to XML Conversion Sample #1

Goto Sample #2

Goto Sample #3

Goto Sample #4

This is the first of several examples describing the details of how the Chilkat HTML-to-XML library converts HTML into well-formed XML.

We'll begin with the following HTML and the describe the features of the generated XML:

<html>
<head>
<title>This is a test</title>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
</head>
<body>
<h1>This is the heading</h1>
<p>Lorem ipsum dolor sit amet, <b>consectetur</b> adipisicing elit, sed do eiusmod tempor incididunt ut labore <br> et dolore magna aliqua.
<p>Ut enim ad minim veniam, <a href="http://www.google.com/">quis nostrud exercitation</a> ullamco laboris nisi ut aliquip ex ea commodo consequat.
</body>
</html>

The XML output is shown below.

  • The XML is written to match the encoding of the HTML. In the HTML above, the charset is windows-1252, so the encoding attribute is set to windows-1252.
  • The root node of the XML document is always <root>. The <html> node is found directly underneath. The reason for the "root" node is because you may encounter poorly formed HTML such that it has more than one root-level node.
  • All text content is placed under <text> nodes.
<?xml version="1.0" encoding="windows-1252" ?>

<root>
    <html>
        <head>
            <title>
                <text>This is a test</text>
            </title>
            <meta http-equiv="Content-Language" content="en-us"></meta>
            <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"></meta>
        </head>
        <body>
            <h1>
                <text>This is the heading</text>
            </h1>
            <p>
                <text>Lorem ipsum dolor sit amet,  consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore  et dolore magna aliqua.
                </text>
            </p>
            <p>
                <text>Ut enim ad minim veniam, </text>
                <a href="http://www.google.com/">
                    <text>quis nostrud exercitation</text>
                </a>
                <text>ullamco laboris nisi ut aliquip ex ea commodo consequat.
                </text>
            </p>
        </body>
    </html>
</root>

(The Chilkat HTML-to-XML API is offered across many programming languages: Ruby, Perl, Python, Java, C#, VB.NET, etc.)


Privacy Statement. Copyright 2000-2010 Chilkat Software, Inc. All rights reserved.
Send feedback to support@chilkatsoft.com

Components for Microsoft Windows 7, Vista, XP, 2000, 2003 Server, and Windows 95/98/NT4.

Downloads
.NET 2.0
.NET 1.*
.NET x64
VC++ 6.0
VC++ 7.0
VC++ 8.0
Java
Ruby
Perl 5.8.*
Perl 5.10.*
Python
Bounce ActiveX
Charset ActiveX
Email ActiveX
FTP2 ActiveX
Crypt ActiveX
HTML-to-XML ActiveX
HTTP ActiveX
IMAP ActiveX
MHT ActiveX
MIME ActiveX
RSA ActiveX
Socket ActiveX
Spider ActiveX (free)
String ActiveX (free)
Tar ActiveX
Upload ActiveX (free)
XML ActiveX (free)
XMP ActiveX
Zip ActiveX