HTML-to-XML Component Features
The Chilkat HTML-to-XML component is designed for the purpose of transforming HTML into well-formed XML for parsing. If effect, it is designed to be an HTML parser / scraper. Once HTML is converted to XHTML (i.e. well-formed XML), the plethora of existing XML parsing components and libraries can be leveraged for HTML parsing and scraping.
Also includes HTML to plain-text conversion. The internal conversion process is much more sophisticated than can be accomplished with the simple regular-expression freeware codes found in the Internet. It is more than simply removing HTML tags from an HTML document.
- File-to-file HTML to XML conversion.
- Memory-to-memory HTML to XML conversion.
- Convert character encoding during conversion process.
- Flexibility in controlling how HTML entities are handled.
- Automatically convert HTML entities to corresponding 8-bit characters.
- Optionally drop all text formatting tags from the output.
- Drop/undrop specific tags from the output.
- HTML to plain-text conversion.
- Thread safe.
Statement. Copyright 2000-2015 Chilkat
Software, Inc. All rights reserved.
(Regarding the usage of the Android logo) Portions of this page are reproduced from work created and shared by Google and used according to terms described in the Creative Commons
3.0 Attribution License.
Send feedback to firstname.lastname@example.org
Software components and libraries for Linux, MAC OS X, iOS, Android™, Solaris, HP-UX, RHEL/CentOS, FreeBSD, MinGW
Windows 10, Windows 8, Windows Server 2012, Windows 7, Vista, XP, 2003 Server, 2008 Server, etc.