Crawler-commons 0.6-SNAPSHOT API

Packages 
Package Description
crawlercommons  
crawlercommons.fetcher
The main fetching package within Crawler Commons, this package defines base fetching and encoding classes, Enum's to determine reasoning behind typical fetching behaviour as well as the base Exceptions which may be used.
crawlercommons.fetcher.file
This package includes the SimpleFileFetcher code which extends the BaseFetcher.
crawlercommons.fetcher.http
This package concerns the fetching of files over the HTTP protocol: Extending from BaseHttpFetcher (which itself extends BaseFetcher) the SimpleHttpFetcher provides the Crawler Commons HTTP fetching implementation.
crawlercommons.robots
The robots package contains all of the robots.txt rule inference, parsing and utilities contained within Crawler Commons.
crawlercommons.sitemaps
Sitemaps package provides all classes relevant to focused sitemap parsing, url definition and processing.
crawlercommons.url
Classes contained within the url package relate to the definition of Top Level Domain's, various domain registrars and the effective handling of such domains.

Copyright © 2015. All rights reserved.