2011-07-25 12:23:21 +02:00
|
|
|
Crawler-Commons Change Log
|
|
|
|
|
2014-07-01 07:11:27 +02:00
|
|
|
Release 0.5
|
2014-09-24 15:43:09 +02:00
|
|
|
- Issue 45: [Sitemaps] Upgrade code after release of Tika v1.6 (Avi Hayun)
|
2014-09-10 14:37:17 +02:00
|
|
|
- Issue 48: Upgraded to Tika 1.6 (jnioche)
|
2014-08-19 21:08:27 +02:00
|
|
|
- Issue 47: [Sitemaps] SiteMapParser Tika detection doesn't work well on some cases (Avi Hayun)
|
2014-08-06 21:06:45 +02:00
|
|
|
- Issue 40: [Sitemaps] Add Tika MediaType Support (Avi Hayun)
|
|
|
|
- Issue 39: [Sitemaps] Add the Parser a convenience method with only a URL argument (Avi Hayun via lewismc)
|
2014-07-01 07:11:27 +02:00
|
|
|
- Issue 42: [Sitemaps] Add more JUnit tests (Avi Hayun via lewismc)
|
2014-08-06 21:06:45 +02:00
|
|
|
- Issue 37: Upgrade the Slf4j logging Library to v1.7.7 (Avi Hayun via kkrugler)
|
|
|
|
- Issue 41: Upgrade to JUnit v4 conventions in SiteMapParser (Avi Hayun via lewismc)
|
|
|
|
- Issue 34: Upgrade the Slf4j logging in SiteMaps (Avi Hayun via lewismc)
|
2014-06-24 04:49:01 +02:00
|
|
|
|
2014-03-20 22:50:05 +01:00
|
|
|
Release 0.4
|
2014-03-19 20:15:05 +01:00
|
|
|
- Issue 13: Fix deprecation in Crawler Commons Code (lewismc via kkrugler)
|
|
|
|
- Issue 8 : Upgrade of httpclient to v4.2.6 (Fuad Efendi, lewismc via kkrugler)
|
|
|
|
- Issue 18: Support matching against query parameters in robots.txt rules (alparslanavci, kkrugler)
|
2014-03-16 22:53:28 +01:00
|
|
|
- Issue 21: Follow Google example of giving Allow directives higher match weight than Disallow directives (y.vladimirov, via kkrugler)
|
|
|
|
- Issue 22: Use longest-match-wins approach to matching URLs in robots.txt (kkrugler)
|
|
|
|
- Issue 17: Support Googlebot-compatible regular expressions in URL specifications (alparslanavci. kkrugler)
|
|
|
|
- Issue 31: Missing top level domains (jnioche, kkrugler)
|
|
|
|
- Issue 23: Trivial improvements to UserAgent (lewismc)
|
|
|
|
- Issue 30: SitemapIndex should allow to skip sitemaps (Sebastian Nagel, kkrugler)
|
2013-10-21 17:31:14 +02:00
|
|
|
- cleanup of ANT build remnants [lib and lib-ext] (jnioche)
|
|
|
|
|
2013-10-03 11:12:38 +02:00
|
|
|
Release 0.3
|
2013-10-11 12:40:00 +02:00
|
|
|
- Upgraded to Tika 1.4 (jnioche)
|
2013-07-18 16:01:37 +02:00
|
|
|
- [SiteMap] added utility class for testing sitemaps (jnioche)
|
2013-07-01 21:18:25 +02:00
|
|
|
- Issue 16: remove ant scripts and configuration (lewismc)
|
2013-05-24 16:09:26 +02:00
|
|
|
- Issue 27: [SiteMap] Unnecessary String concatenations when logging + in SiteMapURL.toString() (jnioche)
|
2013-05-24 16:15:51 +02:00
|
|
|
- Issue 26: [SiteMap] Set correct default priority for URL in a sitemap file (jnioche)
|
2013-09-06 14:33:02 +02:00
|
|
|
- Issue 25: [Robots] Robots parser should not lowercase sitemap URLs (jnioche)
|
2013-10-02 15:40:50 +02:00
|
|
|
- Issue 29: [SiteMap] try urls when <loc> element is missing (jnioche)
|
2013-05-24 16:09:26 +02:00
|
|
|
|
2011-07-25 12:23:21 +02:00
|
|
|
Release 0.2
|
2013-01-30 05:12:34 +01:00
|
|
|
- Move to pure Maven for CC build lifecycle (lewismc)
|
|
|
|
- Move Javadoc out of core code (lewismc)
|
2013-01-28 03:45:41 +01:00
|
|
|
- Substantiate Javadoc (lewismc)
|
2013-01-28 03:43:34 +01:00
|
|
|
- Review default.properties (lewismc)
|
2013-01-24 00:08:51 +01:00
|
|
|
- add HTTP status code & reason to FetchedResult (Fuad Efendi via kkrugler)
|
|
|
|
- support for multiple user agent names (Tejas Patil via kkrugler)
|
|
|
|
- added javadoc generation, publish in /doc/javadoc (kkrugler)
|
|
|
|
- switch to using eclipse-formatter.properties (kkrugler)
|
|
|
|
- support robots.txt files that have UTF-16LE and UTF-16BE BOMs (kkrugler)
|
|
|
|
- support for user agent names that contain spaces (kkrugler)
|
|
|
|
- fixed handling of BOM in sitemaps (Vivek Magotra via kkrugler)
|
2011-07-25 12:23:21 +02:00
|
|
|
- refactoring of SiteMap objects (Hannes Schwarz via jnioche)
|
|
|
|
- added simple support for the file: protocol (kkrugler)
|
|
|
|
- cleaned up packaging and added "install" target (kkrugler)
|
|
|
|
|
|
|
|
Release 0.1
|
|
|
|
- parsing robots.txt
|
|
|
|
- parsing sitemaps
|
|
|
|
- URL analyzer which returns Top Level Domains
|
|
|
|
- a simple HttpFetcher
|