Chilkat.HtmlToText Class Overview
Chilkat.HtmlToText converts HTML into readable plain text. It can
decode HTML entities, wrap output at a configurable right margin, optionally
include or suppress hyperlink URLs, convert HTML held in a string or
StringBuilder, and read or write text files using a
specified character encoding.
What the Class Is Used For
Use Chilkat.HtmlToText when an application needs a
plain-text version of HTML content, such as converting email bodies, web page
fragments, HTML reports, or stored markup into text suitable for logging, indexing,
display in a text-only environment, or saving to a text file.
Convert HTML to Text
Convert an HTML string directly with ToText.
Use StringBuilder
Convert HTML already held in a StringBuilder with
ToTextSb.
Control Wrapping
Use RightMargin to wrap text near a desired line
width.
Control Link Output
Suppress links, list URLs as references, or emit URLs inline.
Typical Workflow
-
Create an HtmlToText object.
-
Optionally set DecodeHtmlEntities,
RightMargin,
SuppressLinks, or
UncommonOptions.
-
Obtain the HTML as a string, or read it from a file with
ReadFileToString.
-
Convert the HTML with ToText, or use
ToTextSb when the HTML is already in a
StringBuilder.
-
Optionally save the converted text with
WriteStringToFile.
-
Check LastErrorText if a conversion or file
operation fails or behaves unexpectedly.
Core Concepts
| Concept |
Meaning |
Important Members |
| HTML Entity Decoding |
Converts HTML entities such as & to
their plain-text characters.
|
DecodeHtmlEntities |
| Line Wrapping |
Controls where converted text is wrapped into lines.
|
RightMargin |
| Hyperlink Handling |
Determines whether link URLs are omitted, listed as references, or emitted
inline.
|
SuppressLinks,
UncommonOptions
|
| String Conversion |
Converts HTML content already available in memory.
|
ToText,
ToTextSb
|
| File Helpers |
Convenience methods for reading source HTML text and saving converted text.
|
ReadFileToString,
WriteStringToFile
|
Properties
| Property |
Purpose |
Default / Guidance |
| DecodeHtmlEntities |
Controls whether HTML entities are decoded automatically.
|
Default is true. For example,
& becomes
&. Set false to preserve entities exactly
as they appear.
|
| RightMargin |
Controls text wrapping.
|
Default is 80. The converter tries to break
lines at spaces near the margin. Set to 0 for
no right margin.
|
| SuppressLinks |
Controls whether link URLs are included in the plain-text output.
|
Default is true, meaning link URLs are
suppressed. Set false to include link URLs as references at the end of the
text.
|
| UncommonOptions |
Comma-separated keywords for less common link-output behavior.
|
Normally empty. Supports NoReferencesList and
EmitUrls.
|
| LastErrorText |
Diagnostic information for the last method or property access.
|
Check after failures or unexpected behavior. Diagnostic information may be
available regardless of success or failure.
|
Conversion Methods
| Method |
Input |
Result |
| ToText |
HTML content as a string.
|
Returns the converted plain-text string.
|
| ToTextSb |
HTML content in a StringBuilder.
|
Converts the HTML in the supplied StringBuilder
to plain text and returns true for success.
|
In-memory conversion:
Use ToText for simple string input. Use
ToTextSb when the HTML is already in a
StringBuilder and should be converted there.
Hyperlink Output Behavior
| Setting |
Effect |
Use When |
| SuppressLinks = true |
Link text is retained, but link URLs are not included in the output.
|
Use for clean plain text where URLs are not needed.
|
| SuppressLinks = false |
Link URLs are listed as references at the end of the plain-text output.
|
Use when the text should preserve URL information without cluttering the main
text.
|
| EmitUrls |
Emits hyperlink URLs inline in the plain-text output.
|
Use when URLs should appear immediately beside or near their link text.
|
| NoReferencesList |
Prevents generation of the hyperlink references list at the end of the
plain text.
|
Use when reference-style link output is not desired.
|
Default behavior:
SuppressLinks defaults to
true, so hyperlink URLs are omitted unless link output
is explicitly enabled.
File Helper Methods
| Method |
Purpose |
Character Encoding |
| ReadFileToString |
Reads a text file into an in-memory string.
|
The srcCharset argument specifies the input file
encoding, such as utf-8.
|
| WriteStringToFile |
Saves a string to a text file.
|
The charset argument specifies the output file
encoding, such as utf-8.
|
Encoding matters:
Choose the correct charset when reading or writing files so characters are
interpreted and saved correctly.
Method Summary by Category
| Category |
Methods / Properties |
Purpose |
| Convert HTML |
ToText,
ToTextSb
|
Convert HTML content to plain text.
|
| Control output |
DecodeHtmlEntities,
RightMargin,
SuppressLinks,
UncommonOptions
|
Configure entity decoding, line wrapping, and hyperlink handling.
|
| Read and write files |
ReadFileToString,
WriteStringToFile
|
Load HTML text from a file or save converted text to a file.
|
| Diagnostics |
LastErrorText |
Read diagnostic information after failed or unexpected operations.
|
Diagnostics and Troubleshooting
| Problem Area |
Member |
What to Check |
| Output contains HTML entities |
DecodeHtmlEntities |
Confirm this property is set to true if entities
such as & should be decoded.
|
| Lines wrap too early or too late |
RightMargin |
Adjust the margin value, or set it to 0 for no
right margin.
|
| URLs are missing from output |
SuppressLinks,
UncommonOptions
|
Set SuppressLinks to false if link URLs should
be included.
|
| URLs appear in the wrong style |
UncommonOptions |
Use EmitUrls for inline URL output, or
NoReferencesList to suppress the reference list.
|
| File text has incorrect characters |
ReadFileToString,
WriteStringToFile
|
Verify that the correct source or output charset was supplied.
|
| Need operation details after failure |
LastErrorText |
Check diagnostic text after failed or unexpected conversion, file read, or
file write operations.
|
Common Pitfalls
| Pitfall |
Better Approach |
| Expecting URLs to appear when SuppressLinks is left at its default. |
Set SuppressLinks to false when URL output is
required.
|
| Using the wrong charset when reading the HTML source file. |
Pass the actual file encoding to ReadFileToString.
|
| Saving text with the wrong output encoding. |
Pass the desired output encoding to
WriteStringToFile.
|
| Expecting no line wrapping while leaving RightMargin at 80. |
Set RightMargin to
0 for no right margin.
|
| Ignoring diagnostic information after a failed conversion or file operation. |
Check LastErrorText for details.
|
Best Practices
| Recommendation |
Reason |
| Keep DecodeHtmlEntities enabled for normal text output. |
It produces readable text by converting HTML entities to their intended
characters.
|
| Set RightMargin based on where the text will be displayed. |
Wrapped output is easier to read in fixed-width logs, emails, or console
output.
|
| Decide how URLs should be represented before converting. |
The combination of SuppressLinks and
UncommonOptions controls whether URLs are
suppressed, referenced, or emitted inline.
|
| Use ToTextSb when using StringBuilder-based workflows. |
It avoids unnecessary string copying when the HTML is already stored in a
mutable buffer.
|
| Specify charsets explicitly for file input and output. |
Explicit encodings help avoid mojibake or data loss when reading and writing
non-ASCII text.
|
| Check LastErrorText after failures. |
It provides useful diagnostic detail for conversion and file I/O operations.
|
Summary
Chilkat.HtmlToText is a focused utility class for
converting HTML to plain text. It supports entity decoding, configurable line
wrapping, flexible hyperlink handling, string and
StringBuilder conversion, and convenience methods for
reading and writing text files with specified character encodings.
The most important practical guidance is to choose whether URLs should be
suppressed, listed, or emitted inline; set RightMargin
for the desired wrapping behavior; keep
DecodeHtmlEntities enabled for normal readable output;
and use the correct charset when reading from or writing to files.