A B C D E F G H I K L M N P R S T U V W _ 

A

abort() - Method in class crawlercommons.fetcher.BaseFetcher
Terminate any async request being processed.
abort() - Method in class crawlercommons.fetcher.file.SimpleFileFetcher
 
abort() - Method in class crawlercommons.fetcher.http.SimpleHttpFetcher
 
AbortedFetchException - Exception in crawlercommons.fetcher
 
AbortedFetchException() - Constructor for exception crawlercommons.fetcher.AbortedFetchException
 
AbortedFetchException(String, AbortedFetchReason) - Constructor for exception crawlercommons.fetcher.AbortedFetchException
 
AbortedFetchException(String, String, AbortedFetchReason) - Constructor for exception crawlercommons.fetcher.AbortedFetchException
 
AbortedFetchReason - Enum in crawlercommons.fetcher
 
AbstractSiteMap - Class in crawlercommons.sitemaps
SiteMap or SiteMapIndex
AbstractSiteMap() - Constructor for class crawlercommons.sitemaps.AbstractSiteMap
 
AbstractSiteMap.SitemapType - Enum in crawlercommons.sitemaps
Various Sitemap types
addCookie(Cookie) - Method in class crawlercommons.fetcher.http.LocalCookieStore
Adds an HTTP cookie, replacing any existing equivalent cookies.
addCookies(Cookie[]) - Method in class crawlercommons.fetcher.http.LocalCookieStore
Adds an array of HTTP cookies.
addRule(String, boolean) - Method in class crawlercommons.robots.SimpleRobotRules
 
addSitemap(String) - Method in class crawlercommons.robots.BaseRobotRules
 
addSiteMapUrl(SiteMapURL) - Method in class crawlercommons.sitemaps.SiteMap
 
addValidMimeType(String) - Method in class crawlercommons.fetcher.BaseFetcher
 
addValidMimeTypes(Set<String>) - Method in class crawlercommons.fetcher.BaseFetcher
 

B

BadProtocolFetchException - Exception in crawlercommons.fetcher
 
BadProtocolFetchException() - Constructor for exception crawlercommons.fetcher.BadProtocolFetchException
 
BadProtocolFetchException(String) - Constructor for exception crawlercommons.fetcher.BadProtocolFetchException
 
BaseFetcher - Class in crawlercommons.fetcher
 
BaseFetcher() - Constructor for class crawlercommons.fetcher.BaseFetcher
 
BaseFetchException - Exception in crawlercommons.fetcher
 
BaseFetchException() - Constructor for exception crawlercommons.fetcher.BaseFetchException
 
BaseFetchException(String) - Constructor for exception crawlercommons.fetcher.BaseFetchException
 
BaseFetchException(String, String) - Constructor for exception crawlercommons.fetcher.BaseFetchException
 
BaseFetchException(String, Exception) - Constructor for exception crawlercommons.fetcher.BaseFetchException
 
BaseFetchException(String, String, Exception) - Constructor for exception crawlercommons.fetcher.BaseFetchException
 
BaseHttpFetcher - Class in crawlercommons.fetcher.http
 
BaseHttpFetcher(int, UserAgent) - Constructor for class crawlercommons.fetcher.http.BaseHttpFetcher
 
BaseHttpFetcher.RedirectMode - Enum in crawlercommons.fetcher.http
 
BaseRobotRules - Class in crawlercommons.robots
Result from parsing a single robots.txt file - which means we get a set of rules, and a crawl-delay.
BaseRobotRules() - Constructor for class crawlercommons.robots.BaseRobotRules
 
BaseRobotsParser - Class in crawlercommons.robots
 
BaseRobotsParser() - Constructor for class crawlercommons.robots.BaseRobotsParser
 

C

clear() - Method in class crawlercommons.fetcher.http.LocalCookieStore
Clears all cookies.
clear() - Method in class crawlercommons.fetcher.Payload
 
clearExpired(Date) - Method in class crawlercommons.fetcher.http.LocalCookieStore
Removes all of cookies in this HTTP state that have expired by the specified date.
clearRules() - Method in class crawlercommons.robots.SimpleRobotRules
 
COMMENT - Static variable in class crawlercommons.url.EffectiveTldFinder
 
compareTo(SimpleRobotRules.RobotRule) - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
 
compareToBase(BaseFetchException) - Method in exception crawlercommons.fetcher.BaseFetchException
 
containsKey(Object) - Method in class crawlercommons.fetcher.Payload
 
containsValue(Object) - Method in class crawlercommons.fetcher.Payload
 
convertToDate(String) - Static method in class crawlercommons.sitemaps.AbstractSiteMap
Convert the given date (given in an acceptable DateFormat), null if the date is not in the correct format.
crawlercommons - package crawlercommons
 
CrawlerCommons - Class in crawlercommons
 
CrawlerCommons() - Constructor for class crawlercommons.CrawlerCommons
 
crawlercommons.fetcher - package crawlercommons.fetcher
The main fetching package within Crawler Commons, this package defines base fetching and encoding classes, Enum's to determine reasoning behind typical fetching behaviour as well as the base Exceptions which may be used.
crawlercommons.fetcher.file - package crawlercommons.fetcher.file
This package includes the SimpleFileFetcher code which extends the BaseFetcher.
crawlercommons.fetcher.http - package crawlercommons.fetcher.http
This package concerns the fetching of files over the HTTP protocol: Extending from BaseHttpFetcher (which itself extends BaseFetcher) the SimpleHttpFetcher provides the Crawler Commons HTTP fetching implementation.
crawlercommons.robots - package crawlercommons.robots
The robots package contains all of the robots.txt rule inference, parsing and utilities contained within Crawler Commons.
crawlercommons.sitemaps - package crawlercommons.sitemaps
Sitemaps package provides all classes relevant to focused sitemap parsing, url definition and processing.
crawlercommons.url - package crawlercommons.url
Classes contained within the url package relate to the definition of Top Level Domain's, various domain registrars and the effective handling of such domains.
createFetcher(BaseHttpFetcher) - Static method in class crawlercommons.robots.RobotUtils
 
createFetcher(UserAgent, int) - Static method in class crawlercommons.robots.RobotUtils
 

D

DEFAULT_ACCEPT_LANGUAGE - Static variable in class crawlercommons.fetcher.http.BaseHttpFetcher
 
DEFAULT_BROWSER_VERSION - Static variable in class crawlercommons.fetcher.http.UserAgent
 
DEFAULT_CRAWLER_VERSION - Static variable in class crawlercommons.fetcher.http.UserAgent
 
DEFAULT_MAX_CONNECTIONS_PER_HOST - Static variable in class crawlercommons.fetcher.http.BaseHttpFetcher
 
DEFAULT_MAX_CONTENT_SIZE - Static variable in class crawlercommons.fetcher.BaseFetcher
 
DEFAULT_MAX_REDIRECTS - Static variable in class crawlercommons.fetcher.http.BaseHttpFetcher
 
DEFAULT_MIN_RESPONSE_RATE - Static variable in class crawlercommons.fetcher.http.BaseHttpFetcher
 
DEFAULT_REDIRECT_MODE - Static variable in class crawlercommons.fetcher.http.BaseHttpFetcher
 
defaultPriority - Static variable in class crawlercommons.sitemaps.SiteMapURL
 
DOT - Static variable in class crawlercommons.url.EffectiveTldFinder
 
DOT_REGEX - Static variable in class crawlercommons.url.EffectiveTldFinder
 

E

EffectiveTldFinder - Class in crawlercommons.url
Given a URL's hostname, there are determining the actual domain requires knowledge of the various domain registrars and their assignment policies.
EffectiveTldFinder.EffectiveTLD - Class in crawlercommons.url
 
EffectiveTldFinder.EffectiveTLD(String) - Constructor for class crawlercommons.url.EffectiveTldFinder.EffectiveTLD
 
EncodingUtils - Class in crawlercommons.fetcher
 
EncodingUtils() - Constructor for class crawlercommons.fetcher.EncodingUtils
 
EncodingUtils.ExpandedResult - Class in crawlercommons.fetcher
 
EncodingUtils.ExpandedResult(byte[], boolean) - Constructor for class crawlercommons.fetcher.EncodingUtils.ExpandedResult
 
entrySet() - Method in class crawlercommons.fetcher.Payload
 
equals(Object) - Method in exception crawlercommons.fetcher.BaseFetchException
 
equals(Object) - Method in class crawlercommons.fetcher.Payload
 
equals(Object) - Method in class crawlercommons.robots.BaseRobotRules
 
equals(Object) - Method in class crawlercommons.robots.SimpleRobotRules
 
equals(Object) - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
 
equals(Object) - Method in class crawlercommons.sitemaps.SiteMapURL
 
ETLD_DATA - Static variable in class crawlercommons.url.EffectiveTldFinder
 
EXCEPTION - Static variable in class crawlercommons.url.EffectiveTldFinder
 

F

failedFetch(int) - Method in class crawlercommons.robots.BaseRobotsParser
The fetch of robots.txt failed, so return rules appropriate give the HTTP status code.
failedFetch(int) - Method in class crawlercommons.robots.SimpleRobotRulesParser
 
fetch(String) - Method in class crawlercommons.fetcher.http.SimpleHttpFetcher
 
fetch(HttpRequestBase, String, Payload) - Method in class crawlercommons.fetcher.http.SimpleHttpFetcher
 
FetchedResult - Class in crawlercommons.fetcher
 
FetchedResult(String, String, long, Metadata, byte[], String, int, Payload, String, int, String, int, String) - Constructor for class crawlercommons.fetcher.FetchedResult
 
finalize() - Method in class crawlercommons.fetcher.http.SimpleHttpFetcher
 

G

get(String) - Method in class crawlercommons.fetcher.BaseFetcher
 
get(String, Payload) - Method in class crawlercommons.fetcher.BaseFetcher
Get the content stored in the resource referenced by
get(String, Payload) - Method in class crawlercommons.fetcher.file.SimpleFileFetcher
 
get(String, Payload) - Method in class crawlercommons.fetcher.http.SimpleHttpFetcher
 
get(Object) - Method in class crawlercommons.fetcher.Payload
 
getAbortReason() - Method in exception crawlercommons.fetcher.AbortedFetchException
 
getAcceptLanguage() - Method in class crawlercommons.fetcher.http.BaseHttpFetcher
 
getAgentName() - Method in class crawlercommons.fetcher.http.UserAgent
Obtain the just the user agent name
getAssignedDomain(String) - Static method in class crawlercommons.url.EffectiveTldFinder
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name.
getBaseUrl() - Method in class crawlercommons.fetcher.FetchedResult
 
getBaseUrl() - Method in class crawlercommons.sitemaps.SiteMap
 
getCause() - Method in exception crawlercommons.fetcher.BaseFetchException
 
getChangeFrequency() - Method in class crawlercommons.sitemaps.SiteMapURL
Return the URL's change frequency
getConnectionTimeout() - Method in class crawlercommons.fetcher.http.SimpleHttpFetcher
 
getContent() - Method in class crawlercommons.fetcher.FetchedResult
 
getContentLength() - Method in class crawlercommons.fetcher.FetchedResult
 
getContentType() - Method in class crawlercommons.fetcher.FetchedResult
 
getCookies() - Method in class crawlercommons.fetcher.http.LocalCookieStore
Returns an immutable array of cookies that this HTTP state currently contains.
getCrawlDelay() - Method in class crawlercommons.robots.BaseRobotRules
 
getDefaultMaxContentSize() - Method in class crawlercommons.fetcher.BaseFetcher
 
getDomain() - Method in class crawlercommons.url.EffectiveTldFinder.EffectiveTLD
 
getEffectiveTLD(String) - Static method in class crawlercommons.url.EffectiveTldFinder
 
getEffectiveTLDs() - Static method in class crawlercommons.url.EffectiveTldFinder
 
getError() - Method in exception crawlercommons.sitemaps.UnknownFormatException
public method, callable by exception catcher.
getExpanded() - Method in class crawlercommons.fetcher.EncodingUtils.ExpandedResult
 
getFetchedUrl() - Method in class crawlercommons.fetcher.FetchedResult
 
getFetchTime() - Method in class crawlercommons.fetcher.FetchedResult
 
getFullDateFormat() - Static method in class crawlercommons.sitemaps.AbstractSiteMap
 
getHeaders() - Method in class crawlercommons.fetcher.FetchedResult
 
getHostAddress() - Method in class crawlercommons.fetcher.FetchedResult
 
getHttpHeaders() - Method in exception crawlercommons.fetcher.HttpFetchException
 
getHttpStatus() - Method in exception crawlercommons.fetcher.HttpFetchException
 
getHttpVersion() - Method in class crawlercommons.fetcher.http.SimpleHttpFetcher
 
getInstance() - Static method in class crawlercommons.url.EffectiveTldFinder
 
getKeepAliveDuration(HttpResponse, HttpContext) - Method in class crawlercommons.fetcher.http.SimpleHttpFetcher.MyConnectionKeepAliveStrategy
 
getLastModified() - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
getLastModified() - Method in class crawlercommons.sitemaps.SiteMapURL
Return when this URL was last modified.
getLocalizedMessage() - Method in exception crawlercommons.fetcher.BaseFetchException
 
getMaxConnectionsPerHost() - Method in class crawlercommons.fetcher.http.BaseHttpFetcher
 
getMaxContentSize(String) - Method in class crawlercommons.fetcher.BaseFetcher
 
getMaxFetchTime() - Static method in class crawlercommons.robots.RobotUtils
 
getMaxRedirects() - Method in class crawlercommons.fetcher.http.BaseHttpFetcher
 
getMaxRetryCount() - Method in class crawlercommons.fetcher.http.SimpleHttpFetcher
 
getMaxThreads() - Method in class crawlercommons.fetcher.http.BaseHttpFetcher
 
getMessage() - Method in exception crawlercommons.fetcher.BaseFetchException
 
getMessage() - Method in exception crawlercommons.fetcher.HttpFetchException
 
getMimeTypeFromContentType(String) - Static method in class crawlercommons.fetcher.BaseFetcher
 
getMinResponseRate() - Method in class crawlercommons.fetcher.http.BaseHttpFetcher
Return the minimum response rate.
getNewBaseUrl() - Method in class crawlercommons.fetcher.FetchedResult
 
getNumRedirects() - Method in class crawlercommons.fetcher.FetchedResult
 
getNumWarnings() - Method in class crawlercommons.robots.SimpleRobotRulesParser
 
getPayload() - Method in class crawlercommons.fetcher.FetchedResult
 
getPLD(String) - Static method in class crawlercommons.url.PaidLevelDomain
Extract the PLD (paid-level domain) from the hostname.
getPLD(URL) - Static method in class crawlercommons.url.PaidLevelDomain
Extract the PLD (paid-level domain) from the URL.
getPriority() - Method in class crawlercommons.sitemaps.SiteMapURL
Return this URL's priority (a value between [0.0 - 1.0]).
getReason() - Method in exception crawlercommons.fetcher.RedirectFetchException
 
getReasonPhrase() - Method in class crawlercommons.fetcher.FetchedResult
 
getRedirectedUrl() - Method in exception crawlercommons.fetcher.RedirectFetchException
 
getRedirectMode() - Method in class crawlercommons.fetcher.http.BaseHttpFetcher
 
getResponseRate() - Method in class crawlercommons.fetcher.FetchedResult
 
getRobotRules(BaseHttpFetcher, BaseRobotsParser, URL) - Static method in class crawlercommons.robots.RobotUtils
Externally visible, static method for use in tools and for testing.
getSitemap(URL) - Method in class crawlercommons.sitemaps.SiteMapIndex
Returns the Sitemap that has the given URL.
getSitemaps() - Method in class crawlercommons.robots.BaseRobotRules
 
getSitemaps() - Method in class crawlercommons.sitemaps.SiteMapIndex
 
getSiteMapUrls() - Method in class crawlercommons.sitemaps.SiteMap
 
getSocketTimeout() - Method in class crawlercommons.fetcher.http.SimpleHttpFetcher
 
getStackTrace() - Method in exception crawlercommons.fetcher.BaseFetchException
 
getStatusCode() - Method in class crawlercommons.fetcher.FetchedResult
 
getType() - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
getUrl() - Method in exception crawlercommons.fetcher.BaseFetchException
 
getUrl() - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
getUrl() - Method in class crawlercommons.sitemaps.SiteMapURL
Return the URL.
getUserAgent() - Method in class crawlercommons.fetcher.http.BaseHttpFetcher
 
getUserAgentString() - Method in class crawlercommons.fetcher.http.UserAgent
Obtain a String representing the user agent characteristics.
getValidMimeTypes() - Method in class crawlercommons.fetcher.BaseFetcher
 
getVersion() - Static method in class crawlercommons.CrawlerCommons
 

H

hashCode() - Method in exception crawlercommons.fetcher.BaseFetchException
 
hashCode() - Method in class crawlercommons.fetcher.Payload
 
hashCode() - Method in class crawlercommons.robots.BaseRobotRules
 
hashCode() - Method in class crawlercommons.robots.SimpleRobotRules
 
hashCode() - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
 
hashCode() - Method in class crawlercommons.sitemaps.SiteMapURL
 
hasUnprocessedSitemap() - Method in class crawlercommons.sitemaps.SiteMapIndex
 
HttpFetchException - Exception in crawlercommons.fetcher
 
HttpFetchException() - Constructor for exception crawlercommons.fetcher.HttpFetchException
 
HttpFetchException(String, String, int, Metadata) - Constructor for exception crawlercommons.fetcher.HttpFetchException
 

I

initCause(Throwable) - Method in exception crawlercommons.fetcher.BaseFetchException
 
initialize(InputStream) - Method in class crawlercommons.url.EffectiveTldFinder
 
IOFetchException - Exception in crawlercommons.fetcher
 
IOFetchException() - Constructor for exception crawlercommons.fetcher.IOFetchException
 
IOFetchException(String, IOException) - Constructor for exception crawlercommons.fetcher.IOFetchException
 
isAllowAll() - Method in class crawlercommons.robots.BaseRobotRules
 
isAllowAll() - Method in class crawlercommons.robots.SimpleRobotRules
Is our ruleset set up to allow all access?
isAllowed(String) - Method in class crawlercommons.robots.BaseRobotRules
 
isAllowed(String) - Method in class crawlercommons.robots.SimpleRobotRules
 
isAllowNone() - Method in class crawlercommons.robots.BaseRobotRules
 
isAllowNone() - Method in class crawlercommons.robots.SimpleRobotRules
Is our ruleset set up to disallow all access?
isConfigured() - Method in class crawlercommons.url.EffectiveTldFinder
 
isDeferVisits() - Method in class crawlercommons.robots.BaseRobotRules
 
isEmpty() - Method in class crawlercommons.fetcher.Payload
 
isException() - Method in class crawlercommons.url.EffectiveTldFinder.EffectiveTLD
 
isIndex() - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
isIndex() - Method in class crawlercommons.sitemaps.SiteMap
 
isIndex() - Method in class crawlercommons.sitemaps.SiteMapIndex
 
isProcessed() - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
isStrict() - Method in class crawlercommons.sitemaps.SiteMapParser
 
isTruncated() - Method in class crawlercommons.fetcher.EncodingUtils.ExpandedResult
 
isValid() - Method in class crawlercommons.sitemaps.SiteMapURL
 
isWild() - Method in class crawlercommons.url.EffectiveTldFinder.EffectiveTLD
 

K

keySet() - Method in class crawlercommons.fetcher.Payload
 

L

LocalCookieStore - Class in crawlercommons.fetcher.http
Default implementation of CookieStore Initially copied from HttpComponents Changes: removed synchronization
LocalCookieStore() - Constructor for class crawlercommons.fetcher.http.LocalCookieStore
 
LOG - Static variable in class crawlercommons.sitemaps.SiteMapParser
 

M

main(String[]) - Static method in class crawlercommons.sitemaps.SiteMapTester
 
MAX_BYTES_ALLOWED - Static variable in class crawlercommons.sitemaps.SiteMapParser
Sitemap docs must be limited to 10MB (10,485,760 bytes)

N

nextUnprocessedSitemap() - Method in class crawlercommons.sitemaps.SiteMapIndex
 
NO_MIN_RESPONSE_RATE - Static variable in class crawlercommons.fetcher.http.BaseHttpFetcher
 
NO_REDIRECTS - Static variable in class crawlercommons.fetcher.http.BaseHttpFetcher
 

P

PaidLevelDomain - Class in crawlercommons.url
Routines to extract the PLD (paid-level domain, as per the IRLbot paper) from a hostname or URL.
PaidLevelDomain() - Constructor for class crawlercommons.url.PaidLevelDomain
 
parseContent(String, byte[], String, String) - Method in class crawlercommons.robots.BaseRobotsParser
Parse the robots.txt file in , and return rules appropriate for processing paths by
parseContent(String, byte[], String, String) - Method in class crawlercommons.robots.SimpleRobotRulesParser
 
parseSiteMap(URL) - Method in class crawlercommons.sitemaps.SiteMapParser
Returns a SiteMap or SiteMapIndex given an online sitemap URL
Please note that this method is a static method which goes online and fetches the sitemap then parses it

This method is a convenience method for a user who has a sitemap URL and wants a "Keep it simple" way to parse it.
parseSiteMap(String, byte[], AbstractSiteMap) - Method in class crawlercommons.sitemaps.SiteMapParser
Returns a processed copy of an unprocessed sitemap object, i.e.
parseSiteMap(byte[], URL) - Method in class crawlercommons.sitemaps.SiteMapParser
 
parseSiteMap(String, byte[], URL) - Method in class crawlercommons.sitemaps.SiteMapParser
 
Payload - Class in crawlercommons.fetcher
 
Payload() - Constructor for class crawlercommons.fetcher.Payload
 
printStackTrace() - Method in exception crawlercommons.fetcher.BaseFetchException
 
printStackTrace(PrintStream) - Method in exception crawlercommons.fetcher.BaseFetchException
 
printStackTrace(PrintWriter) - Method in exception crawlercommons.fetcher.BaseFetchException
 
processDeflateEncoded(byte[]) - Static method in class crawlercommons.fetcher.EncodingUtils
 
processDeflateEncoded(byte[], int) - Static method in class crawlercommons.fetcher.EncodingUtils
 
processGzipEncoded(byte[]) - Static method in class crawlercommons.fetcher.EncodingUtils
 
processGzipEncoded(byte[], int) - Static method in class crawlercommons.fetcher.EncodingUtils
 
put(String, Object) - Method in class crawlercommons.fetcher.Payload
 
putAll(Map<? extends String, ? extends Object>) - Method in class crawlercommons.fetcher.Payload
 

R

readBaseFields(DataInput) - Method in exception crawlercommons.fetcher.BaseFetchException
 
RedirectFetchException - Exception in crawlercommons.fetcher
 
RedirectFetchException() - Constructor for exception crawlercommons.fetcher.RedirectFetchException
 
RedirectFetchException(String, String, RedirectFetchException.RedirectExceptionReason) - Constructor for exception crawlercommons.fetcher.RedirectFetchException
 
RedirectFetchException.RedirectExceptionReason - Enum in crawlercommons.fetcher
 
remove(Object) - Method in class crawlercommons.fetcher.Payload
 
report() - Method in class crawlercommons.fetcher.FetchedResult
Produces a neat report containing everything from a FetchedResult .
RobotUtils - Class in crawlercommons.robots
 
RobotUtils() - Constructor for class crawlercommons.robots.RobotUtils
 
run() - Method in class crawlercommons.fetcher.http.SimpleHttpFetcher.IdleConnectionMonitorThread
 

S

setAcceptLanguage(String) - Method in class crawlercommons.fetcher.http.BaseHttpFetcher
 
setChangeFrequency(SiteMapURL.ChangeFrequency) - Method in class crawlercommons.sitemaps.SiteMapURL
Set the URL's change frequency
setChangeFrequency(String) - Method in class crawlercommons.sitemaps.SiteMapURL
Set the URL's change frequency In case of a bad ChangeFrequency, the current frequency in this instance will be set to NULL
setConnectionTimeout(int) - Method in class crawlercommons.fetcher.http.SimpleHttpFetcher
 
setCrawlDelay(long) - Method in class crawlercommons.robots.BaseRobotRules
 
setDefaultMaxContentSize(int) - Method in class crawlercommons.fetcher.BaseFetcher
 
setDeferVisits(boolean) - Method in class crawlercommons.robots.BaseRobotRules
 
setExpanded(byte[]) - Method in class crawlercommons.fetcher.EncodingUtils.ExpandedResult
 
setHttpVersion(HttpVersion) - Method in class crawlercommons.fetcher.http.SimpleHttpFetcher
 
setLastModified(Date) - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
setLastModified(String) - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
setLastModified(String) - Method in class crawlercommons.sitemaps.SiteMapURL
Set when this URL was last modified.
setLastModified(Date) - Method in class crawlercommons.sitemaps.SiteMapURL
Set when this URL was last modified.
setMaxConnectionsPerHost(int) - Method in class crawlercommons.fetcher.http.BaseHttpFetcher
 
setMaxContentSize(String, int) - Method in class crawlercommons.fetcher.BaseFetcher
 
setMaxRedirects(int) - Method in class crawlercommons.fetcher.http.BaseHttpFetcher
 
setMaxRetryCount(int) - Method in class crawlercommons.fetcher.http.SimpleHttpFetcher
 
setMinResponseRate(int) - Method in class crawlercommons.fetcher.http.BaseHttpFetcher
 
setPayload(Payload) - Method in class crawlercommons.fetcher.FetchedResult
 
setPriority(double) - Method in class crawlercommons.sitemaps.SiteMapURL
Set the URL's priority to a value between [0.0 - 1.0] (Default Priority is used if the given priority is out of range).
setPriority(String) - Method in class crawlercommons.sitemaps.SiteMapURL
Set the URL's priority to a value between [0.0 - 1.0] (Default Priority is used if the given priority missing or is out of range).
setProcessed(boolean) - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
setRedirectMode(BaseHttpFetcher.RedirectMode) - Method in class crawlercommons.fetcher.http.BaseHttpFetcher
 
setSocketTimeout(int) - Method in class crawlercommons.fetcher.http.SimpleHttpFetcher
 
setStackTrace(StackTraceElement[]) - Method in exception crawlercommons.fetcher.BaseFetchException
 
setTruncated(boolean) - Method in class crawlercommons.fetcher.EncodingUtils.ExpandedResult
 
setUrl(URL) - Method in class crawlercommons.sitemaps.SiteMapURL
Set the URL.
setUrl(String) - Method in class crawlercommons.sitemaps.SiteMapURL
Set the URL.
setValid(boolean) - Method in class crawlercommons.sitemaps.SiteMapURL
 
setValidMimeTypes(Set<String>) - Method in class crawlercommons.fetcher.BaseFetcher
 
SimpleFileFetcher - Class in crawlercommons.fetcher.file
 
SimpleFileFetcher() - Constructor for class crawlercommons.fetcher.file.SimpleFileFetcher
 
SimpleHttpFetcher - Class in crawlercommons.fetcher.http
 
SimpleHttpFetcher(UserAgent) - Constructor for class crawlercommons.fetcher.http.SimpleHttpFetcher
 
SimpleHttpFetcher(int, UserAgent) - Constructor for class crawlercommons.fetcher.http.SimpleHttpFetcher
 
SimpleHttpFetcher.IdleConnectionMonitorThread - Class in crawlercommons.fetcher.http
 
SimpleHttpFetcher.IdleConnectionMonitorThread(ClientConnectionManager) - Constructor for class crawlercommons.fetcher.http.SimpleHttpFetcher.IdleConnectionMonitorThread
 
SimpleHttpFetcher.MyConnectionKeepAliveStrategy - Class in crawlercommons.fetcher.http
 
SimpleHttpFetcher.MyConnectionKeepAliveStrategy() - Constructor for class crawlercommons.fetcher.http.SimpleHttpFetcher.MyConnectionKeepAliveStrategy
 
SimpleRobotRules - Class in crawlercommons.robots
Result from parsing a single robots.txt file - which means we get a set of rules, and a crawl-delay.
SimpleRobotRules() - Constructor for class crawlercommons.robots.SimpleRobotRules
 
SimpleRobotRules(SimpleRobotRules.RobotRulesMode) - Constructor for class crawlercommons.robots.SimpleRobotRules
 
SimpleRobotRules.RobotRule - Class in crawlercommons.robots
Single rule that maps from a path prefix to an allow flag.
SimpleRobotRules.RobotRule(String, boolean) - Constructor for class crawlercommons.robots.SimpleRobotRules.RobotRule
 
SimpleRobotRules.RobotRulesMode - Enum in crawlercommons.robots
 
SimpleRobotRulesParser - Class in crawlercommons.robots
 
SimpleRobotRulesParser() - Constructor for class crawlercommons.robots.SimpleRobotRulesParser
 
SiteMap - Class in crawlercommons.sitemaps
 
SiteMap() - Constructor for class crawlercommons.sitemaps.SiteMap
 
SiteMap(URL) - Constructor for class crawlercommons.sitemaps.SiteMap
 
SiteMap(String) - Constructor for class crawlercommons.sitemaps.SiteMap
 
SiteMap(URL, Date) - Constructor for class crawlercommons.sitemaps.SiteMap
 
SiteMap(String, String) - Constructor for class crawlercommons.sitemaps.SiteMap
 
SiteMapIndex - Class in crawlercommons.sitemaps
 
SiteMapIndex() - Constructor for class crawlercommons.sitemaps.SiteMapIndex
 
SiteMapIndex(URL) - Constructor for class crawlercommons.sitemaps.SiteMapIndex
 
SiteMapParser - Class in crawlercommons.sitemaps
 
SiteMapParser() - Constructor for class crawlercommons.sitemaps.SiteMapParser
 
SiteMapParser(boolean) - Constructor for class crawlercommons.sitemaps.SiteMapParser
 
SiteMapTester - Class in crawlercommons.sitemaps
Sitemap Tool for recursively fetching all URL's from a sitemap (and all of it's children)
SiteMapTester() - Constructor for class crawlercommons.sitemaps.SiteMapTester
 
SiteMapURL - Class in crawlercommons.sitemaps
The SitemapUrl class represents a URL found in a Sitemap.
SiteMapURL(String, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
 
SiteMapURL(URL, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
 
SiteMapURL(String, String, String, String, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
 
SiteMapURL(URL, Date, SiteMapURL.ChangeFrequency, double, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
 
SiteMapURL.ChangeFrequency - Enum in crawlercommons.sitemaps
Allowed change frequencies
size() - Method in class crawlercommons.fetcher.Payload
 
sortRules() - Method in class crawlercommons.robots.SimpleRobotRules
In order to match up with Google's convention, we want to match rules from longest to shortest.

T

toString() - Method in exception crawlercommons.fetcher.BaseFetchException
 
toString() - Method in class crawlercommons.fetcher.http.LocalCookieStore
 
toString() - Method in class crawlercommons.sitemaps.SiteMap
 
toString() - Method in class crawlercommons.sitemaps.SiteMapIndex
 
toString() - Method in class crawlercommons.sitemaps.SiteMapURL
 
toString() - Method in class crawlercommons.url.EffectiveTldFinder.EffectiveTLD
 

U

UnknownFormatException - Exception in crawlercommons.sitemaps
 
UnknownFormatException() - Constructor for exception crawlercommons.sitemaps.UnknownFormatException
Default constructor - initializes instance variable to unknown
UnknownFormatException(String) - Constructor for exception crawlercommons.sitemaps.UnknownFormatException
Constructor receives some kind of message that is saved in an instance variable.
UNSET_CRAWL_DELAY - Static variable in class crawlercommons.robots.BaseRobotRules
 
url - Variable in class crawlercommons.sitemaps.AbstractSiteMap
 
UrlFetchException - Exception in crawlercommons.fetcher
 
UrlFetchException() - Constructor for exception crawlercommons.fetcher.UrlFetchException
 
UrlFetchException(String, String) - Constructor for exception crawlercommons.fetcher.UrlFetchException
 
UserAgent - Class in crawlercommons.fetcher.http
User Agent enables us to describe characteristics of any Crawler Commons agent.
UserAgent(String, String, String) - Constructor for class crawlercommons.fetcher.http.UserAgent
Set user agent characteristics
UserAgent(String, String, String, String) - Constructor for class crawlercommons.fetcher.http.UserAgent
Set user agent characteristics
UserAgent(String, String, String, String, String) - Constructor for class crawlercommons.fetcher.http.UserAgent
Set user agent characteristics

V

valueOf(String) - Static method in enum crawlercommons.fetcher.AbortedFetchReason
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum crawlercommons.fetcher.http.BaseHttpFetcher.RedirectMode
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum crawlercommons.fetcher.RedirectFetchException.RedirectExceptionReason
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum crawlercommons.robots.SimpleRobotRules.RobotRulesMode
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum crawlercommons.sitemaps.AbstractSiteMap.SitemapType
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
Returns the enum constant of this type with the specified name.
values() - Static method in enum crawlercommons.fetcher.AbortedFetchReason
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum crawlercommons.fetcher.http.BaseHttpFetcher.RedirectMode
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Method in class crawlercommons.fetcher.Payload
 
values() - Static method in enum crawlercommons.fetcher.RedirectFetchException.RedirectExceptionReason
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum crawlercommons.robots.SimpleRobotRules.RobotRulesMode
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum crawlercommons.sitemaps.AbstractSiteMap.SitemapType
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
Returns an array containing the constants of this enum type, in the order they are declared.

W

WILD_CARD - Static variable in class crawlercommons.url.EffectiveTldFinder
 
writeBaseFields(DataOutput) - Method in exception crawlercommons.fetcher.BaseFetchException
 

_

_acceptLanguage - Variable in class crawlercommons.fetcher.http.BaseHttpFetcher
 
_defaultMaxContentSize - Variable in class crawlercommons.fetcher.BaseFetcher
 
_maxConnectionsPerHost - Variable in class crawlercommons.fetcher.http.BaseHttpFetcher
 
_maxContentSizes - Variable in class crawlercommons.fetcher.BaseFetcher
 
_maxRedirects - Variable in class crawlercommons.fetcher.http.BaseHttpFetcher
 
_maxThreads - Variable in class crawlercommons.fetcher.http.BaseHttpFetcher
 
_minResponseRate - Variable in class crawlercommons.fetcher.http.BaseHttpFetcher
 
_redirectMode - Variable in class crawlercommons.fetcher.http.BaseHttpFetcher
 
_userAgent - Variable in class crawlercommons.fetcher.http.BaseHttpFetcher
 
_validMimeTypes - Variable in class crawlercommons.fetcher.BaseFetcher
 
A B C D E F G H I K L M N P R S T U V W _ 

Copyright © 2015. All rights reserved.