HTML to XML Conversion Sample #4 - HTML Text Formatting Tags
Goto Sample #1
Goto Sample #2
Goto Sample #3
This is the 4th of several examples describing the details of how the Chilkat HTML-to-XML library converts HTML into well-formed XML.
By default, HTML text formatting tags (b, font, i, u, br, center, em, strong, big, tt, s, small, strike, sub, and sup) are dropped during the conversion. A method exists, UndropTextFormattingTags, which can be called to prevent the tags from being dropped.
Here is an example with and without formatting tags dropped.
<html><body>This <b>is</b> a <i>test</i></body></html>
The XML output with text formatting tags dropped is shown below.
<?xml version="1.0" encoding="utf-8" ?>
<root>
<html>
<body>
<text>This is a test</text>
</body>
</html>
</root>
The XML output without text formatting tags dropped is shown below.
<?xml version="1.0" encoding="utf-8" ?>
<root>
<html>
<body>
<text>This </text>
<b>
<text>is</text>
</b>
<text>a </text>
<i>
<text>test</text>
</i>
</body>
</html>
</root>
(The Chilkat HTML-to-XML API is offered across many programming languages: Ruby, Perl, Python, Java, C#, VB.NET, etc.)