Chilkat.Mht Class Overview

Chilkat.Mht creates MHT web archives and EML email messages from web pages, local HTML files, or in-memory HTML strings. It can download and embed related resources such as images and style sheets, optionally include scripts for MHT output, use disk caching, support proxies and authentication, save or zip the generated output, and unpack existing MHT files back into HTML and supporting files.

What the Class Is Used For

Use Chilkat.Mht when an application needs to package an HTML page and its related resources into a single MIME-based document. The input can be a website URL, a local HTML file, or an HTML string. The output can be MHT, EML, a saved file, an in-memory string, or an entry appended to a ZIP file.

Create MHT Archives Convert web pages, local HTML files, or HTML strings into MHT documents with embedded page resources.
Create EML Messages Convert HTML into email-ready EML MIME messages with embedded images and style sheets.
Control Embedded Content Decide whether images are embedded, whether scripts are included, whether CIDs are used, and how MIME disposition headers are written.
Unpack MHT Files Extract MHT contents into an HTML file and supporting resource files.

Typical Workflow: Create MHT from a URL or HTML File

  1. Create an Mht object.
  2. Configure network settings when needed, such as Proxy, SocksVersion, NtlmAuth, website credentials, timeouts, or SSL certificate verification.
  3. Configure output behavior, such as EmbedImages, UseCids, NoScripts, PreferMHTScripts, UseFilename, and UseInline.
  4. If converting a local HTML file or HTML string with relative links, set BaseUrl.
  5. Optionally configure disk caching with AddCacheRoot, FetchFromCache, UpdateCache, and NumCacheLevels.
  6. Call GetMHT, GetAndSaveMHT, or GetAndZipMHT.
  7. Check LastErrorText after failures or unexpected behavior.

Typical Workflow: Create EML from HTML

  1. Create an Mht object.
  2. Configure image, CID, cache, proxy, authentication, timeout, and SSL settings as needed.
  3. Use a URL, local HTML file path, or in-memory HTML string as input.
  4. Call GetEML, GetAndSaveEML, GetAndZipEML, HtmlToEML, or HtmlToEMLFile.
  5. Scripts are always removed when creating EML or emails from HTML.

Core Concepts

Concept Meaning Important Members
MHT / MHTML Output A single MIME document containing HTML and related resources such as images, style sheets, and optionally scripts. GetMHT, GetAndSaveMHT, HtmlToMHT
EML Output MIME email output created from a web page, local HTML file, or HTML string. GetEML, GetAndSaveEML, HtmlToEML
Embedded Resources External images, style sheets, and other related page parts can be downloaded and embedded into the output. EmbedImages, EmbedLocalOnly, UseCids
Base URL The URL used to resolve relative links when the input is a local file or in-memory HTML string rather than a website URL. BaseUrl
Disk Cache Optional cache used to fetch or store page parts such as images and style sheets. AddCacheRoot, FetchFromCache, UpdateCache
Unpacking Existing MHT content can be unpacked into an HTML file and supporting resource files. UnpackMHT, UnpackMHTString, UnpackDirect

Conversion Methods

Input MHT Output EML Output Result Form
URL or local HTML file GetMHT GetEML Returns the generated MIME data as a string.
URL or local HTML file GetAndSaveMHT GetAndSaveEML Saves the generated output to a file.
URL or local HTML file GetAndZipMHT GetAndZipEML Compresses and appends the generated output to a ZIP file.
In-memory HTML string HtmlToMHT HtmlToEML Returns the generated MIME data as a string.
In-memory HTML string HtmlToMHTFile HtmlToEMLFile Saves the generated output to a file.
Input flexibility: URL/file-based methods accept either a web page URL or a local HTML file path. HTML string methods are used when the HTML is already in memory.

Embedding and MIME Output Properties

Property Purpose Default / Guidance
EmbedImages Controls whether images are embedded in the generated MHT/EML. If false, image src attributes are converted to absolute URLs when necessary, and images are not embedded.
EmbedLocalOnly Embeds only images found on the local filesystem. Useful when remote images should remain external but local file references should be embedded.
UseCids Controls whether embedded references use generated cid: URLs. Default is true. If false, URLs are left unchanged and embedded parts contain matching Content-Location headers.
UseFilename Adds a filename attribute to each embedded item’s Content-Disposition header. Default is true.
UseInline Adds an inline attribute to each embedded item’s Content-Disposition header. Default is true.
NoScripts Removes scripts when creating MHT files. Default is false. Applies only to MHT creation. Scripts are always removed when creating EML or emails from HTML.
PreferMHTScripts Chooses between scripts and noscript alternatives when possible. Default is true, preserving scripts and discarding noscript alternatives. If false, scripts with noscript alternatives are removed and the noscript content is kept.

URL, HTML, and Debugging Properties

Property Purpose When to Use
BaseUrl Defines the base URL used to convert relative HREFs to absolute HREFs when processing a local HTML file or HTML string. Set when the input is not a website URL and contains relative links.
DebugHtmlBefore Filename where the input HTML is saved before conversion. Use when troubleshooting resource discovery, link rewriting, or conversion problems.
DebugHtmlAfter Filename where the result HTML is saved after conversion processing. Compare before/after HTML to understand how the conversion changed references.

HTTP, Proxy, and Authentication Properties

Property Purpose Default / Guidance
ConnectTimeout Seconds to wait before timing out while connecting to an HTTP server. Default is 10 seconds.
ReadTimeout Seconds to wait while no additional data is forthcoming from the HTTP server. Default is 20 seconds. This is an idle read timeout, not a limit on the total transfer duration.
PreferIpv6 Prefer IPv6 over IPv4 when both are supported for a domain. Default is false, preferring IPv4.
RequireSslCertVerify Requires SSL server certificate verification. Default is false. If true, expired certificates or invalid signatures prevent the connection.
Proxy HTTP proxy host and port. Format as hostname:port, such as www.chilkatsoft.com:100.
ProxyLogin / ProxyPassword Credentials for an authenticating HTTP proxy. Set only when the HTTP proxy requires authentication.
UseIEProxy Uses the proxy host/port configured for Internet Explorer. Useful on Windows when the application should follow IE proxy settings.
NtlmAuth Enables NTLM / Integrated Windows Authentication for website access. Default is false.
WebSiteLogin / WebSitePassword Login and password for a web page requiring authentication. Set when the page requires credentials.
WebSiteLoginDomain Optional domain name used with NTLM authentication. Use with NtlmAuth when a domain is required.

SOCKS Proxy Properties

Property Purpose Guidance
SocksVersion Selects whether a SOCKS proxy is used. 0 = no SOCKS proxy, 4 = SOCKS4, 5 = SOCKS5. Default is 0.
SocksHostname SOCKS4/SOCKS5 proxy hostname or IPv4 address. Used only when SocksVersion is 4 or 5.
SocksPort SOCKS proxy port. Default is 1080.
SocksUsername SOCKS4/SOCKS5 username. Used only when a SOCKS proxy is configured.
SocksPassword SOCKS5 password. SOCKS4 does not use a password.

Disk Cache Properties and Methods

Member Purpose Guidance
AddCacheRoot Adds a disk cache root directory. Call once for each cache root, such as D:\cacheRoot, E:\cacheRoot, and F:\cacheRoot.
GetCacheRoot Returns the Nth cache root. Indexing begins at 0.
NumCacheRoots Number of configured cache roots. Use to confirm disk cache root configuration.
NumCacheLevels Number of directory levels under each cache root. Default is 0. Use 1 or 2 to spread large numbers of cached files across subdirectories.
FetchFromCache Allows page parts such as images and style sheets to be fetched from disk cache when possible. Default is false.
UpdateCache Automatically updates the disk cache with HTTP GET responses. Default is false.
IgnoreNoCache Allows caching even when response headers indicate the page should not be cached. Default is false.
IgnoreMustRevalidate Allows fresh cached content to be served without revalidation even when the response contains Cache-Control: must-revalidate. Applies when FetchFromCache is true. Default is false.
Cache layout note: Multiple cache levels are useful for very large caches because storing thousands of files in a single directory can make file browsers and filesystems less responsive.

Custom Headers and Resource Inclusion

Method Purpose When to Use
AddCustomHeader Adds a custom HTTP header to all HTTP requests sent by the MHT component. Call once for each custom header field needed by the target site.
RemoveCustomHeader Removes a custom header by field name. Use when a previously configured header should no longer be sent.
ClearCustomHeaders Removes all accumulated custom headers. Use before converting unrelated pages that require different request headers.
AddExternalStyleSheet Includes an additional style sheet that would not normally be detected. Rarely needed. Useful when style sheet names are constructed dynamically in JavaScript.
ExcludeImagesMatching Prevents images whose URLs match a pattern from being embedded. Rarely needed. Useful for removing unused images referenced by style sheets so they do not appear as attachments.

Unpacking MHT Files

Member Purpose Important Details
UnpackMHT Unpacks an MHT file into an HTML file and supporting resource files. Takes the MHT filename, unpack directory, HTML filename, and parts subdirectory.
UnpackMHTString Same as UnpackMHT, but the MHT is supplied as an in-memory string. Useful when MHT content is already in memory.
UnpackDirect Controls whether MHT is unpacked directly without transformations. Default is false. When true, HTML is not edited and related parts are unpacked to subdirectories rooted in the unpack directory.
UnpackUseRelPaths Controls whether relative or absolute paths are used in unpacked HTML. Default is true, meaning relative paths are used. Set false to use absolute paths.
Direct unpack limitation: Direct unpacking is only possible when the MHT Content-Location headers do not contain URLs. The related item locations must contain relative paths.

Method Summary by Category

Category Methods Purpose
Create MHT GetMHT, GetAndSaveMHT, GetAndZipMHT, HtmlToMHT, HtmlToMHTFile Generate MHT output from URLs, local HTML files, or in-memory HTML strings.
Create EML GetEML, GetAndSaveEML, GetAndZipEML, HtmlToEML, HtmlToEMLFile Generate EML email output from URLs, local HTML files, or in-memory HTML strings.
Unpack MHT UnpackMHT, UnpackMHTString Extract MHT content into HTML and supporting resource files.
Cache setup AddCacheRoot, GetCacheRoot Configure and inspect disk cache root directories.
HTTP request customization AddCustomHeader, RemoveCustomHeader, ClearCustomHeaders Add, remove, or clear custom headers sent by the MHT component.
Resource control AddExternalStyleSheet, ExcludeImagesMatching Include extra style sheets or exclude images matching a pattern.
Async and reset LoadTaskCaller, RestoreDefaults Support async task workflows and restore default property settings.

Diagnostics and Troubleshooting

Problem Area Member What to Check
Relative links are not resolved correctly BaseUrl Set BaseUrl when converting a local HTML file or HTML string that contains relative links.
Images are missing from output EmbedImages, EmbedLocalOnly, ExcludeImagesMatching Confirm images are allowed to be embedded, and verify they are not excluded by local-only or pattern rules.
Scripts appear or disappear unexpectedly NoScripts, PreferMHTScripts Remember that NoScripts applies to MHT creation, while scripts are always removed when creating EML or emails from HTML.
Remote resources cannot be fetched Proxy, SocksVersion, WebSiteLogin, NtlmAuth, ConnectTimeout, ReadTimeout Check proxy settings, SOCKS settings, authentication, and timeout values.
SSL/TLS connection is accepted or rejected unexpectedly RequireSslCertVerify Set to true when the server certificate must be valid and verified.
Need to understand how HTML was transformed DebugHtmlBefore, DebugHtmlAfter Save the input and processed HTML to files for comparison.
Unpacked HTML references do not look right UnpackDirect, UnpackUseRelPaths Choose direct vs transformed unpacking and relative vs absolute paths based on how the unpacked output will be used.
Need operation details after failure LastErrorText Check diagnostic text after failed or unexpected conversion, download, cache, unpack, proxy, authentication, or file operations.

Common Pitfalls

Pitfall Better Approach
Converting local HTML with relative URLs but not setting BaseUrl. Set BaseUrl so relative HREFs and resource URLs can be resolved correctly.
Expecting EML output to preserve scripts. Scripts are always removed when creating EML or emails from HTML.
Expecting images to be embedded when EmbedImages is false. Set EmbedImages appropriately. If false, image URLs are left as external references.
Using cache-related properties without adding a cache root. Call AddCacheRoot once for each cache root when disk caching is used.
Assuming ReadTimeout limits total download time. Treat it as an idle timeout while no additional data is arriving.
Using UnpackDirect for MHT files whose Content-Location values are URLs. Direct unpacking requires related-item Content-Location headers to contain relative paths, not URLs.
Letting custom headers accumulate across unrelated conversions. Use ClearCustomHeaders or RemoveCustomHeader before processing unrelated sites.

Best Practices

Recommendation Reason
Set BaseUrl for local HTML files and HTML strings. It allows Chilkat to resolve relative links and resource references correctly.
Use DebugHtmlBefore and DebugHtmlAfter when conversion output is unexpected. Comparing the before/after HTML helps identify reference rewriting and resource discovery issues.
Use disk caching for repeated conversions of similar pages. FetchFromCache, UpdateCache, and AddCacheRoot can reduce repeated downloads of images and style sheets.
Choose CID behavior deliberately. UseCids changes embedded references to generated cid: links. Leaving it false preserves URLs and uses matching Content-Location headers.
Use HtmlToMHT* and HtmlToEML* when HTML is already in memory. This avoids writing temporary HTML input files.
Use ZIP output methods when archiving generated MHT/EML results. GetAndZipMHT and GetAndZipEML create the output and append it to a ZIP file in one step.
Call RestoreDefaults after specialized conversions. It prevents one conversion’s special settings from affecting later conversions.
Check LastErrorText after failures. It provides useful diagnostic detail for HTTP fetching, resource embedding, caching, proxy authentication, SSL verification, file writing, ZIP output, and unpacking.

Summary

Chilkat.Mht is the Chilkat class for creating and unpacking MIME-based web archives and email messages from HTML. It can convert URLs, local HTML files, and in-memory HTML strings to MHT or EML, embed related resources, save or ZIP the generated output, manage caching, use HTTP/SOCKS proxies, authenticate to websites, control script and image handling, and unpack MHT files into HTML plus supporting resources.

The most important practical guidance is to set BaseUrl for local or in-memory HTML with relative links, choose embedding and CID options deliberately, configure proxy/authentication settings before fetching remote resources, use caching for repeated conversions, and inspect LastErrorText and debug HTML files when conversion output is unexpected.