HTML-to-XML Component Features
The Chilkat HTML-to-XML component is designed for the purpose of transforming HTML into well-formed XML for parsing. If effect, it is designed to be an HTML parser / scraper. Once HTML is converted to XHTML (i.e. well-formed XML), the plethora of existing XML parsing components and libraries can be leveraged for HTML parsing and scraping.
Also includes HTML to plain-text conversion. The internal conversion process is much more sophisticated than can be accomplished with the simple regular-expression freeware codes found in the Internet. It is more than simply removing HTML tags from an HTML document.
- File-to-file HTML to XML conversion.
- Memory-to-memory HTML to XML conversion.
- Convert character encoding during conversion process.
- Flexibility in controlling how HTML entities are handled.
- Automatically convert HTML entities to corresponding 8-bit characters.
- Optionally drop all text formatting tags from the output.
- Drop/undrop specific tags from the output.
- HTML to plain-text conversion.
|
Privacy
Statement. Copyright 2000-2010 Chilkat
Software, Inc. All rights reserved.
Send feedback to support@chilkatsoft.com Components for Microsoft Windows 7, Vista, XP, 2000, 2003 Server, and Windows 95/98/NT4.
|
|