Spider Component Features
- Crawl a web site.
- Accumulate outbound links for crawling other web sites.
- Cache pages so future crawls can fetch from cache.
- Robots.txt compliant.
- Fetch the HTML content of each page crawled.
- Able to crawl HTTPS pages.
- Define "avoid" patterns to avoid URLs matching specific wildcard patterns.
- Define "avoid" patterns for avoiding matching outbound links.
- Read and connect timeouts.
- Maximum URL size to avoid ever-growing URLs.
- Maximum response size to avoid pages with very large or infinite content.
- Wind-down count to set a limit on pages spidered per site.
|
Privacy
Statement. Copyright 2000-2008 Chilkat
Software, Inc. All rights reserved.
Send feedback to support@chilkatsoft.com Components for Microsoft Windows XP, 2000, 2003 Server, Vista, and Windows 95/98/NT4.
|
|