-
Crawl a single website. -
Accumulate outbound links for crawling other websites. -
Cache pages so future crawls can fetch from cache. -
Robots.txt compliant. -
Fetch the HTML content of each page crawled. -
Able to crawl HTTPS pages. -
Define "avoid" patterns to avoid URLs matching specific wildcard patterns. -
Define "avoid" patterns for avoiding matching outbound links. -
Read and connect timeouts. -
Maximum URL size to avoid ever-growing URLs. -
Maximum response size to avoid pages with very large or infinite content. -
Wind-down count to set a limit on pages spidered per site.
|
Privacy
Statement. Copyright 2000-2010 Chilkat
Software, Inc. All rights reserved.
Send feedback to support@chilkatsoft.com Components for Microsoft Windows 7, Vista, XP, 2000, 2003 Server, and Windows 95/98/NT4.
|

Downloads
.NET 2.0
.NET 1.*
.NET x64
VC++ 6.0
VC++ 7.0
VC++ 8.0
Java
Ruby
Perl 5.8.*
Perl 5.10.*
Python
Bounce ActiveX
Charset ActiveX
Email ActiveX
FTP2 ActiveX
Crypt ActiveX
HTML-to-XML ActiveX
HTTP ActiveX
IMAP ActiveX
MHT ActiveX
MIME ActiveX
RSA ActiveX
Socket ActiveX
Spider ActiveX (free)
String ActiveX (free)
Tar ActiveX
Upload ActiveX (free)
XML ActiveX (free)
XMP ActiveX
Zip ActiveX
|