-
Crawl a single website. -
Accumulate outbound links for crawling other websites. -
Cache pages so future crawls can fetch from cache. -
Robots.txt compliant. -
Fetch the HTML content of each page crawled. -
Able to crawl HTTPS pages. -
Define "avoid" patterns to avoid URLs matching specific wildcard patterns. -
Define "avoid" patterns for avoiding matching outbound links. -
Read and connect timeouts. -
Maximum URL size to avoid ever-growing URLs. -
Maximum response size to avoid pages with very large or infinite content. -
Wind-down count to set a limit on pages spidered per site.
|
Privacy
Statement. Copyright 2000-2012 Chilkat
Software, Inc. All rights reserved.
(Regarding the usage of the Android logo) Portions of this page are reproduced from work created and shared by Google
and used according to terms described in the Creative Commons
3.0 Attribution License.
Send feedback to support@chilkatsoft.com
Software components and libraries for Linux, MAC OS X, IOS (IPhone), Android™, Solaris, RHEL/CentOS Microsoft Windows 7, Vista, XP, 2000, 2003 Server, 2008 Server, and Windows 95/98/NT4.
|
|