Chilkat.HtmlToText Class Overview

Chilkat.HtmlToText converts HTML into readable plain text. It can decode HTML entities, wrap output at a configurable right margin, optionally include or suppress hyperlink URLs, convert HTML held in a string or StringBuilder, and read or write text files using a specified character encoding.

What the Class Is Used For

Use Chilkat.HtmlToText when an application needs a plain-text version of HTML content, such as converting email bodies, web page fragments, HTML reports, or stored markup into text suitable for logging, indexing, display in a text-only environment, or saving to a text file.

Convert HTML to Text Convert an HTML string directly with ToText.
Use StringBuilder Convert HTML already held in a StringBuilder with ToTextSb.
Control Wrapping Use RightMargin to wrap text near a desired line width.
Control Link Output Suppress links, list URLs as references, or emit URLs inline.

Typical Workflow

  1. Create an HtmlToText object.
  2. Optionally set DecodeHtmlEntities, RightMargin, SuppressLinks, or UncommonOptions.
  3. Obtain the HTML as a string, or read it from a file with ReadFileToString.
  4. Convert the HTML with ToText, or use ToTextSb when the HTML is already in a StringBuilder.
  5. Optionally save the converted text with WriteStringToFile.
  6. Check LastErrorText if a conversion or file operation fails or behaves unexpectedly.

Core Concepts

Concept Meaning Important Members
HTML Entity Decoding Converts HTML entities such as & to their plain-text characters. DecodeHtmlEntities
Line Wrapping Controls where converted text is wrapped into lines. RightMargin
Hyperlink Handling Determines whether link URLs are omitted, listed as references, or emitted inline. SuppressLinks, UncommonOptions
String Conversion Converts HTML content already available in memory. ToText, ToTextSb
File Helpers Convenience methods for reading source HTML text and saving converted text. ReadFileToString, WriteStringToFile

Properties

Property Purpose Default / Guidance
DecodeHtmlEntities Controls whether HTML entities are decoded automatically. Default is true. For example, & becomes &. Set false to preserve entities exactly as they appear.
RightMargin Controls text wrapping. Default is 80. The converter tries to break lines at spaces near the margin. Set to 0 for no right margin.
SuppressLinks Controls whether link URLs are included in the plain-text output. Default is true, meaning link URLs are suppressed. Set false to include link URLs as references at the end of the text.
UncommonOptions Comma-separated keywords for less common link-output behavior. Normally empty. Supports NoReferencesList and EmitUrls.
LastErrorText Diagnostic information for the last method or property access. Check after failures or unexpected behavior. Diagnostic information may be available regardless of success or failure.

Conversion Methods

Method Input Result
ToText HTML content as a string. Returns the converted plain-text string.
ToTextSb HTML content in a StringBuilder. Converts the HTML in the supplied StringBuilder to plain text and returns true for success.
In-memory conversion: Use ToText for simple string input. Use ToTextSb when the HTML is already in a StringBuilder and should be converted there.

Hyperlink Output Behavior

Setting Effect Use When
SuppressLinks = true Link text is retained, but link URLs are not included in the output. Use for clean plain text where URLs are not needed.
SuppressLinks = false Link URLs are listed as references at the end of the plain-text output. Use when the text should preserve URL information without cluttering the main text.
EmitUrls Emits hyperlink URLs inline in the plain-text output. Use when URLs should appear immediately beside or near their link text.
NoReferencesList Prevents generation of the hyperlink references list at the end of the plain text. Use when reference-style link output is not desired.
Default behavior: SuppressLinks defaults to true, so hyperlink URLs are omitted unless link output is explicitly enabled.

File Helper Methods

Method Purpose Character Encoding
ReadFileToString Reads a text file into an in-memory string. The srcCharset argument specifies the input file encoding, such as utf-8.
WriteStringToFile Saves a string to a text file. The charset argument specifies the output file encoding, such as utf-8.
Encoding matters: Choose the correct charset when reading or writing files so characters are interpreted and saved correctly.

Method Summary by Category

Category Methods / Properties Purpose
Convert HTML ToText, ToTextSb Convert HTML content to plain text.
Control output DecodeHtmlEntities, RightMargin, SuppressLinks, UncommonOptions Configure entity decoding, line wrapping, and hyperlink handling.
Read and write files ReadFileToString, WriteStringToFile Load HTML text from a file or save converted text to a file.
Diagnostics LastErrorText Read diagnostic information after failed or unexpected operations.

Diagnostics and Troubleshooting

Problem Area Member What to Check
Output contains HTML entities DecodeHtmlEntities Confirm this property is set to true if entities such as & should be decoded.
Lines wrap too early or too late RightMargin Adjust the margin value, or set it to 0 for no right margin.
URLs are missing from output SuppressLinks, UncommonOptions Set SuppressLinks to false if link URLs should be included.
URLs appear in the wrong style UncommonOptions Use EmitUrls for inline URL output, or NoReferencesList to suppress the reference list.
File text has incorrect characters ReadFileToString, WriteStringToFile Verify that the correct source or output charset was supplied.
Need operation details after failure LastErrorText Check diagnostic text after failed or unexpected conversion, file read, or file write operations.

Common Pitfalls

Pitfall Better Approach
Expecting URLs to appear when SuppressLinks is left at its default. Set SuppressLinks to false when URL output is required.
Using the wrong charset when reading the HTML source file. Pass the actual file encoding to ReadFileToString.
Saving text with the wrong output encoding. Pass the desired output encoding to WriteStringToFile.
Expecting no line wrapping while leaving RightMargin at 80. Set RightMargin to 0 for no right margin.
Ignoring diagnostic information after a failed conversion or file operation. Check LastErrorText for details.

Best Practices

Recommendation Reason
Keep DecodeHtmlEntities enabled for normal text output. It produces readable text by converting HTML entities to their intended characters.
Set RightMargin based on where the text will be displayed. Wrapped output is easier to read in fixed-width logs, emails, or console output.
Decide how URLs should be represented before converting. The combination of SuppressLinks and UncommonOptions controls whether URLs are suppressed, referenced, or emitted inline.
Use ToTextSb when using StringBuilder-based workflows. It avoids unnecessary string copying when the HTML is already stored in a mutable buffer.
Specify charsets explicitly for file input and output. Explicit encodings help avoid mojibake or data loss when reading and writing non-ASCII text.
Check LastErrorText after failures. It provides useful diagnostic detail for conversion and file I/O operations.

Summary

Chilkat.HtmlToText is a focused utility class for converting HTML to plain text. It supports entity decoding, configurable line wrapping, flexible hyperlink handling, string and StringBuilder conversion, and convenience methods for reading and writing text files with specified character encodings.

The most important practical guidance is to choose whether URLs should be suppressed, listed, or emitted inline; set RightMargin for the desired wrapping behavior; keep DecodeHtmlEntities enabled for normal readable output; and use the correct charset when reading from or writing to files.