1
0
mirror of https://github.com/crawler-commons/crawler-commons synced 2024-09-22 09:10:42 +02:00
A set of reusable Java components that implement functionality common to any web crawler
Go to file
kkrugler_lists@transpac.com 78e4ae5e9e Added test to validate proper handling of user agent crawler
names that consist of multiple words.
2012-08-15 14:00:24 +00:00
doc Added Apache License 2.0 2011-07-06 14:52:36 +00:00
lib Add jar that's only in (currently unavailable) 101tec Nexus repo, so at least users can manually install it 2011-07-01 17:42:11 +00:00
src Added test to validate proper handling of user agent crawler 2012-08-15 14:00:24 +00:00
build.properties changing version to 0.2-SNAPSHOT 2011-07-06 18:49:12 +00:00
build.xml Added simple support for the file: protocol. 2011-07-21 17:28:53 +00:00
CHANGES.txt added CHANGES.txt + refactoring of SiteMap objects (thanks to Hannes Schwarz) 2011-07-25 10:23:21 +00:00
pom.xml Fixed handling of BOM in sitemaps (from Vivek Magotra) 2012-08-14 16:22:32 +00:00