Chilkat.Mht Class Overview
Chilkat.Mht creates MHT web archives and EML email messages from web pages, local HTML files, or in-memory HTML strings. It can download and embed related resources such as images and style sheets, optionally include scripts for MHT output, use disk caching, support proxies and authentication, save or zip the generated output, and unpack existing MHT files back into HTML and supporting files.
What the Class Is Used For
Use Chilkat.Mht when an application needs to package an HTML page and its related resources into a single MIME-based document. The input can be a website URL, a local HTML file, or an HTML string. The output can be MHT, EML, a saved file, an in-memory string, or an entry appended to a ZIP file.
Typical Workflow: Create MHT from a URL or HTML File
- Create an Mht object.
- Configure network settings when needed, such as Proxy, SocksVersion, NtlmAuth, website credentials, timeouts, or SSL certificate verification.
- Configure output behavior, such as EmbedImages, UseCids, NoScripts, PreferMHTScripts, UseFilename, and UseInline.
- If converting a local HTML file or HTML string with relative links, set BaseUrl.
- Optionally configure disk caching with AddCacheRoot, FetchFromCache, UpdateCache, and NumCacheLevels.
- Call GetMHT, GetAndSaveMHT, or GetAndZipMHT.
- Check LastErrorText after failures or unexpected behavior.
Typical Workflow: Create EML from HTML
- Create an Mht object.
- Configure image, CID, cache, proxy, authentication, timeout, and SSL settings as needed.
- Use a URL, local HTML file path, or in-memory HTML string as input.
- Call GetEML, GetAndSaveEML, GetAndZipEML, HtmlToEML, or HtmlToEMLFile.
- Scripts are always removed when creating EML or emails from HTML.
Core Concepts
| Concept | Meaning | Important Members |
|---|---|---|
| MHT / MHTML Output | A single MIME document containing HTML and related resources such as images, style sheets, and optionally scripts. | GetMHT, GetAndSaveMHT, HtmlToMHT |
| EML Output | MIME email output created from a web page, local HTML file, or HTML string. | GetEML, GetAndSaveEML, HtmlToEML |
| Embedded Resources | External images, style sheets, and other related page parts can be downloaded and embedded into the output. | EmbedImages, EmbedLocalOnly, UseCids |
| Base URL | The URL used to resolve relative links when the input is a local file or in-memory HTML string rather than a website URL. | BaseUrl |
| Disk Cache | Optional cache used to fetch or store page parts such as images and style sheets. | AddCacheRoot, FetchFromCache, UpdateCache |
| Unpacking | Existing MHT content can be unpacked into an HTML file and supporting resource files. | UnpackMHT, UnpackMHTString, UnpackDirect |
Conversion Methods
| Input | MHT Output | EML Output | Result Form |
|---|---|---|---|
| URL or local HTML file | GetMHT | GetEML | Returns the generated MIME data as a string. |
| URL or local HTML file | GetAndSaveMHT | GetAndSaveEML | Saves the generated output to a file. |
| URL or local HTML file | GetAndZipMHT | GetAndZipEML | Compresses and appends the generated output to a ZIP file. |
| In-memory HTML string | HtmlToMHT | HtmlToEML | Returns the generated MIME data as a string. |
| In-memory HTML string | HtmlToMHTFile | HtmlToEMLFile | Saves the generated output to a file. |
Embedding and MIME Output Properties
| Property | Purpose | Default / Guidance |
|---|---|---|
| EmbedImages | Controls whether images are embedded in the generated MHT/EML. | If false, image src attributes are converted to absolute URLs when necessary, and images are not embedded. |
| EmbedLocalOnly | Embeds only images found on the local filesystem. | Useful when remote images should remain external but local file references should be embedded. |
| UseCids | Controls whether embedded references use generated cid: URLs. | Default is true. If false, URLs are left unchanged and embedded parts contain matching Content-Location headers. |
| UseFilename | Adds a filename attribute to each embedded item’s Content-Disposition header. | Default is true. |
| UseInline | Adds an inline attribute to each embedded item’s Content-Disposition header. | Default is true. |
| NoScripts | Removes scripts when creating MHT files. | Default is false. Applies only to MHT creation. Scripts are always removed when creating EML or emails from HTML. |
| PreferMHTScripts | Chooses between scripts and noscript alternatives when possible. | Default is true, preserving scripts and discarding noscript alternatives. If false, scripts with noscript alternatives are removed and the noscript content is kept. |
URL, HTML, and Debugging Properties
| Property | Purpose | When to Use |
|---|---|---|
| BaseUrl | Defines the base URL used to convert relative HREFs to absolute HREFs when processing a local HTML file or HTML string. | Set when the input is not a website URL and contains relative links. |
| DebugHtmlBefore | Filename where the input HTML is saved before conversion. | Use when troubleshooting resource discovery, link rewriting, or conversion problems. |
| DebugHtmlAfter | Filename where the result HTML is saved after conversion processing. | Compare before/after HTML to understand how the conversion changed references. |
HTTP, Proxy, and Authentication Properties
| Property | Purpose | Default / Guidance |
|---|---|---|
| ConnectTimeout | Seconds to wait before timing out while connecting to an HTTP server. | Default is 10 seconds. |
| ReadTimeout | Seconds to wait while no additional data is forthcoming from the HTTP server. | Default is 20 seconds. This is an idle read timeout, not a limit on the total transfer duration. |
| PreferIpv6 | Prefer IPv6 over IPv4 when both are supported for a domain. | Default is false, preferring IPv4. |
| RequireSslCertVerify | Requires SSL server certificate verification. | Default is false. If true, expired certificates or invalid signatures prevent the connection. |
| Proxy | HTTP proxy host and port. | Format as hostname:port, such as www.chilkatsoft.com:100. |
| ProxyLogin / ProxyPassword | Credentials for an authenticating HTTP proxy. | Set only when the HTTP proxy requires authentication. |
| UseIEProxy | Uses the proxy host/port configured for Internet Explorer. | Useful on Windows when the application should follow IE proxy settings. |
| NtlmAuth | Enables NTLM / Integrated Windows Authentication for website access. | Default is false. |
| WebSiteLogin / WebSitePassword | Login and password for a web page requiring authentication. | Set when the page requires credentials. |
| WebSiteLoginDomain | Optional domain name used with NTLM authentication. | Use with NtlmAuth when a domain is required. |
SOCKS Proxy Properties
| Property | Purpose | Guidance |
|---|---|---|
| SocksVersion | Selects whether a SOCKS proxy is used. | 0 = no SOCKS proxy, 4 = SOCKS4, 5 = SOCKS5. Default is 0. |
| SocksHostname | SOCKS4/SOCKS5 proxy hostname or IPv4 address. | Used only when SocksVersion is 4 or 5. |
| SocksPort | SOCKS proxy port. | Default is 1080. |
| SocksUsername | SOCKS4/SOCKS5 username. | Used only when a SOCKS proxy is configured. |
| SocksPassword | SOCKS5 password. | SOCKS4 does not use a password. |
Disk Cache Properties and Methods
| Member | Purpose | Guidance |
|---|---|---|
| AddCacheRoot | Adds a disk cache root directory. | Call once for each cache root, such as D:\cacheRoot, E:\cacheRoot, and F:\cacheRoot. |
| GetCacheRoot | Returns the Nth cache root. | Indexing begins at 0. |
| NumCacheRoots | Number of configured cache roots. | Use to confirm disk cache root configuration. |
| NumCacheLevels | Number of directory levels under each cache root. | Default is 0. Use 1 or 2 to spread large numbers of cached files across subdirectories. |
| FetchFromCache | Allows page parts such as images and style sheets to be fetched from disk cache when possible. | Default is false. |
| UpdateCache | Automatically updates the disk cache with HTTP GET responses. | Default is false. |
| IgnoreNoCache | Allows caching even when response headers indicate the page should not be cached. | Default is false. |
| IgnoreMustRevalidate | Allows fresh cached content to be served without revalidation even when the response contains Cache-Control: must-revalidate. | Applies when FetchFromCache is true. Default is false. |
Custom Headers and Resource Inclusion
| Method | Purpose | When to Use |
|---|---|---|
| AddCustomHeader | Adds a custom HTTP header to all HTTP requests sent by the MHT component. | Call once for each custom header field needed by the target site. |
| RemoveCustomHeader | Removes a custom header by field name. | Use when a previously configured header should no longer be sent. |
| ClearCustomHeaders | Removes all accumulated custom headers. | Use before converting unrelated pages that require different request headers. |
| AddExternalStyleSheet | Includes an additional style sheet that would not normally be detected. | Rarely needed. Useful when style sheet names are constructed dynamically in JavaScript. |
| ExcludeImagesMatching | Prevents images whose URLs match a pattern from being embedded. | Rarely needed. Useful for removing unused images referenced by style sheets so they do not appear as attachments. |
Unpacking MHT Files
| Member | Purpose | Important Details |
|---|---|---|
| UnpackMHT | Unpacks an MHT file into an HTML file and supporting resource files. | Takes the MHT filename, unpack directory, HTML filename, and parts subdirectory. |
| UnpackMHTString | Same as UnpackMHT, but the MHT is supplied as an in-memory string. | Useful when MHT content is already in memory. |
| UnpackDirect | Controls whether MHT is unpacked directly without transformations. | Default is false. When true, HTML is not edited and related parts are unpacked to subdirectories rooted in the unpack directory. |
| UnpackUseRelPaths | Controls whether relative or absolute paths are used in unpacked HTML. | Default is true, meaning relative paths are used. Set false to use absolute paths. |
Method Summary by Category
| Category | Methods | Purpose |
|---|---|---|
| Create MHT | GetMHT, GetAndSaveMHT, GetAndZipMHT, HtmlToMHT, HtmlToMHTFile | Generate MHT output from URLs, local HTML files, or in-memory HTML strings. |
| Create EML | GetEML, GetAndSaveEML, GetAndZipEML, HtmlToEML, HtmlToEMLFile | Generate EML email output from URLs, local HTML files, or in-memory HTML strings. |
| Unpack MHT | UnpackMHT, UnpackMHTString | Extract MHT content into HTML and supporting resource files. |
| Cache setup | AddCacheRoot, GetCacheRoot | Configure and inspect disk cache root directories. |
| HTTP request customization | AddCustomHeader, RemoveCustomHeader, ClearCustomHeaders | Add, remove, or clear custom headers sent by the MHT component. |
| Resource control | AddExternalStyleSheet, ExcludeImagesMatching | Include extra style sheets or exclude images matching a pattern. |
| Async and reset | LoadTaskCaller, RestoreDefaults | Support async task workflows and restore default property settings. |
Diagnostics and Troubleshooting
| Problem Area | Member | What to Check |
|---|---|---|
| Relative links are not resolved correctly | BaseUrl | Set BaseUrl when converting a local HTML file or HTML string that contains relative links. |
| Images are missing from output | EmbedImages, EmbedLocalOnly, ExcludeImagesMatching | Confirm images are allowed to be embedded, and verify they are not excluded by local-only or pattern rules. |
| Scripts appear or disappear unexpectedly | NoScripts, PreferMHTScripts | Remember that NoScripts applies to MHT creation, while scripts are always removed when creating EML or emails from HTML. |
| Remote resources cannot be fetched | Proxy, SocksVersion, WebSiteLogin, NtlmAuth, ConnectTimeout, ReadTimeout | Check proxy settings, SOCKS settings, authentication, and timeout values. |
| SSL/TLS connection is accepted or rejected unexpectedly | RequireSslCertVerify | Set to true when the server certificate must be valid and verified. |
| Need to understand how HTML was transformed | DebugHtmlBefore, DebugHtmlAfter | Save the input and processed HTML to files for comparison. |
| Unpacked HTML references do not look right | UnpackDirect, UnpackUseRelPaths | Choose direct vs transformed unpacking and relative vs absolute paths based on how the unpacked output will be used. |
| Need operation details after failure | LastErrorText | Check diagnostic text after failed or unexpected conversion, download, cache, unpack, proxy, authentication, or file operations. |
Common Pitfalls
| Pitfall | Better Approach |
|---|---|
| Converting local HTML with relative URLs but not setting BaseUrl. | Set BaseUrl so relative HREFs and resource URLs can be resolved correctly. |
| Expecting EML output to preserve scripts. | Scripts are always removed when creating EML or emails from HTML. |
| Expecting images to be embedded when EmbedImages is false. | Set EmbedImages appropriately. If false, image URLs are left as external references. |
| Using cache-related properties without adding a cache root. | Call AddCacheRoot once for each cache root when disk caching is used. |
| Assuming ReadTimeout limits total download time. | Treat it as an idle timeout while no additional data is arriving. |
| Using UnpackDirect for MHT files whose Content-Location values are URLs. | Direct unpacking requires related-item Content-Location headers to contain relative paths, not URLs. |
| Letting custom headers accumulate across unrelated conversions. | Use ClearCustomHeaders or RemoveCustomHeader before processing unrelated sites. |
Best Practices
| Recommendation | Reason |
|---|---|
| Set BaseUrl for local HTML files and HTML strings. | It allows Chilkat to resolve relative links and resource references correctly. |
| Use DebugHtmlBefore and DebugHtmlAfter when conversion output is unexpected. | Comparing the before/after HTML helps identify reference rewriting and resource discovery issues. |
| Use disk caching for repeated conversions of similar pages. | FetchFromCache, UpdateCache, and AddCacheRoot can reduce repeated downloads of images and style sheets. |
| Choose CID behavior deliberately. | UseCids changes embedded references to generated cid: links. Leaving it false preserves URLs and uses matching Content-Location headers. |
| Use HtmlToMHT* and HtmlToEML* when HTML is already in memory. | This avoids writing temporary HTML input files. |
| Use ZIP output methods when archiving generated MHT/EML results. | GetAndZipMHT and GetAndZipEML create the output and append it to a ZIP file in one step. |
| Call RestoreDefaults after specialized conversions. | It prevents one conversion’s special settings from affecting later conversions. |
| Check LastErrorText after failures. | It provides useful diagnostic detail for HTTP fetching, resource embedding, caching, proxy authentication, SSL verification, file writing, ZIP output, and unpacking. |
Summary
Chilkat.Mht is the Chilkat class for creating and unpacking MIME-based web archives and email messages from HTML. It can convert URLs, local HTML files, and in-memory HTML strings to MHT or EML, embed related resources, save or ZIP the generated output, manage caching, use HTTP/SOCKS proxies, authenticate to websites, control script and image handling, and unpack MHT files into HTML plus supporting resources.
The most important practical guidance is to set BaseUrl for local or in-memory HTML with relative links, choose embedding and CID options deliberately, configure proxy/authentication settings before fetching remote resources, use caching for repeated conversions, and inspect LastErrorText and debug HTML files when conversion output is unexpected.