2011-07-25 12:23:21 +02:00
|
|
|
Crawler-Commons Change Log
|
|
|
|
|
2017-10-27 12:08:15 +02:00
|
|
|
Release 0.9 (2017-10-27)
|
|
|
|
- [Sitemaps] Removed DOM-based sitemap parser (jnioche) #177
|
|
|
|
- Incorrect domains returned by EffectiveTldFinder (sebastian-nagel) #172
|
|
|
|
- [Sitemaps] Add namespace aware DOM/SAX parsing for XML Sitemaps (Marko Milicevic, jnioche, sebastian-nagel) #176
|
|
|
|
- Upgraded Tika 1.16 (jnioche) #175
|
2017-07-17 15:02:35 +02:00
|
|
|
- [Sitemaps] Sitemap SAX parsing mangles target URLs (jnioche, sebastian-nagel) #169
|
2017-06-21 00:18:47 +02:00
|
|
|
- [Sitemaps] RSS parser ignores pubDate of link (MichealKum via kkrugler) #166
|
2017-06-09 11:15:44 +02:00
|
|
|
|
|
|
|
Release 0.8 (2017-06-09)
|
2017-06-02 16:02:20 +02:00
|
|
|
- Upgraded Tika 1.15 (jnioche) #163
|
2017-05-04 22:36:40 +02:00
|
|
|
- [Sitemaps] Disable XML resolvers (sebastian-nagel) #151
|
2017-03-20 16:58:55 +01:00
|
|
|
- Update forbiddenapis to v2.3 (jnioche) #99
|
2017-03-18 09:35:19 +01:00
|
|
|
- [Sitemaps] gzipped text files fail to parse (sebastian-nagel) #143
|
2017-02-03 16:45:02 +01:00
|
|
|
- [Sitemaps] Optionally use SAX parser (matt-deboer, jnioche, sebastian-nagel) #116
|
|
|
|
- [Sitemaps] Properly log XML parsing errors (sebastian-nagel) #146
|
2017-02-02 14:56:29 +01:00
|
|
|
- Use StandardCharsets where applicable (sebastian-nagel) #141
|
2017-02-02 11:57:52 +01:00
|
|
|
- Increase sitemap size limit to 50MB (Chaiavi) #132
|
2017-01-13 18:31:10 +01:00
|
|
|
- Remove dependencies to system-specific locale (sebastian-nagel) #137
|
2017-01-11 17:01:03 +01:00
|
|
|
- BasicURLNormalizer: NPE for URLs without authority (sebastian-nagel) #136
|
2016-12-09 13:00:07 +01:00
|
|
|
- BasicURLNormalizer to strip empty port (sebastian-nagel) #133
|
2016-11-27 18:18:21 +01:00
|
|
|
- Remove deprecated HTTP fetcher (kkrugler) #96
|
2016-11-24 11:07:27 +01:00
|
|
|
|
|
|
|
Release 0.7 (2016-11-24)
|
2016-09-30 13:33:06 +02:00
|
|
|
- Upgrade to JDK 1.8 (lewismc) #126
|
|
|
|
- [Sitemaps] SitemapParser methods now protected (michaellavelle) #124
|
|
|
|
- [Sitemaps] Faster parsing of dates (jnioche) #117
|
|
|
|
- Upgraded Tika 1.13 (jnioche) #113
|
|
|
|
- Fix license headers (jnioche) #108
|
|
|
|
- Rename package crawlercommons.url (jnioche) #107
|
|
|
|
- Sitemap url is not extracted if user agent matches earlier in file (srwilson, kkrugler) #112
|
|
|
|
- Deprecate HTTP fetcher support (kkrugler) #92
|
|
|
|
- Added URLFilter interface + BasicURLNormalizer (jnioche) #106
|
|
|
|
- Updated tld names from publicsuffix.org (jnioche) #100
|
|
|
|
- Upgraded http-client to version 4.5.1 (aecio via kkrugler) #84
|
|
|
|
- Upgraded Tika 1.10 (jnioche) #89
|
|
|
|
- [Sitemaps] Upgrade Valid / Legal / Strict SitemapUrls (Avi Hayun) #82
|
|
|
|
- [Sitemaps] Upgrade Valid / Legal / Strict SitemapUrls (Avi Hayun) #60
|
|
|
|
- Simplify pom file (jnioche, lewismc) #77
|
|
|
|
- Upgrade javac.src.version and javac.target.version to 1.7 or 1.8 (lewismc) #93
|
|
|
|
- [Sitemaps] Not able to detect RSS feeds (yogendrasoni via kkrugler) #87
|
|
|
|
- [Robots] Added javadoc comments to the SimpleRobotRulesParser class (MichaelRoeder, kkrugler) #95
|
2015-09-15 07:40:04 +02:00
|
|
|
|
2015-09-15 16:57:13 +02:00
|
|
|
Release 0.6 (2015-05-27)
|
2015-05-22 12:08:21 +02:00
|
|
|
- Issue 75: [Sitemaps] more robust parsing of XML elements (jnioche, kkrugler)
|
2015-05-06 11:10:25 +02:00
|
|
|
- Issue 76: maven-java-formatter-plugin (jnioche)
|
2015-04-22 16:30:25 +02:00
|
|
|
- Issue 73: Switch groupID in pom from com.google.code.crawler-commons to crawler-commons (jnioche)
|
|
|
|
- Issue 71: Upgrade to Tika 1.8 (jnioche)
|
2015-03-27 23:12:14 +01:00
|
|
|
- Issue 68: [Robots] Path matching should be case-sensitive (kkrugler)
|
|
|
|
- Issue 67: [Sitemaps] Parsing of lastMod date should use time portion (kkrugler)
|
2015-01-26 14:21:29 +01:00
|
|
|
- Issue 59: [Robots] Let SimpleRobotRules and its members implements the Serializable interface (kkrugler)
|
2015-04-22 16:30:57 +02:00
|
|
|
- Issue 65: [Sitemaps] Make SiteMapTool simpler by removing the Recursive flag (Avi Hayun)
|
2015-01-22 15:43:34 +01:00
|
|
|
- Issue 64: Upgraded to Tika 1.7 (jnioche)
|
2015-01-22 11:54:14 +01:00
|
|
|
- Issue 32: [Robots] Resolve relative URL for sitemaps (jnioche)
|
2015-01-21 09:59:01 +01:00
|
|
|
- Issue 62: [Sitemaps] Add new parseSiteMap method (jnioche)
|
2015-04-22 16:30:57 +02:00
|
|
|
- Issue 57: [Sitemaps] SiteMap should contain a list of SitemapUrls instead of a table of them (Avi Hayun)
|
|
|
|
- Issue 51: Upgrade httpclient to the latest version (Avi Hayun)
|
|
|
|
- Issue 61: [Sitemaps] Sitemap Parser changes the processed flag unnecessarily (Avi Hayun)
|
2014-11-21 13:01:55 +01:00
|
|
|
- Issue 56: [Sitemaps] SiteMap.setBaseUrl(...) causes the domain name to be lowered case which shouldn't happen (Avi Hayun)
|
2014-10-19 20:59:15 +02:00
|
|
|
- Issue 50: Add Fetch Report to FetchedResult (lewismc, avraham2)
|
2014-11-21 13:01:55 +01:00
|
|
|
- Issue 55: [Sitemaps] SitemapUrl "setPriority(String str)" should check for proper value (Avi Hayun)
|
2014-10-19 20:59:15 +02:00
|
|
|
|
2015-09-15 16:57:13 +02:00
|
|
|
Release 0.5 (2014-10-15)
|
2015-04-22 16:31:20 +02:00
|
|
|
- Issue 53: Spaces in a comma separated list of names in a User-agent: line cause rules to be applicable to all agents (kkrugler)
|
2014-10-10 06:15:23 +02:00
|
|
|
- Issue 45: [Sitemaps] Upgrade code after release of Tika v1.6 (Avi Hayun)
|
|
|
|
- Issue 48: Upgraded to Tika 1.6 (jnioche)
|
|
|
|
- Issue 47: [Sitemaps] SiteMapParser Tika detection doesn't work well on some cases (Avi Hayun)
|
|
|
|
- Issue 40: [Sitemaps] Add Tika MediaType Support (Avi Hayun)
|
2014-08-06 21:06:45 +02:00
|
|
|
- Issue 39: [Sitemaps] Add the Parser a convenience method with only a URL argument (Avi Hayun via lewismc)
|
2014-07-01 07:11:27 +02:00
|
|
|
- Issue 42: [Sitemaps] Add more JUnit tests (Avi Hayun via lewismc)
|
2014-08-06 21:06:45 +02:00
|
|
|
- Issue 37: Upgrade the Slf4j logging Library to v1.7.7 (Avi Hayun via kkrugler)
|
|
|
|
- Issue 41: Upgrade to JUnit v4 conventions in SiteMapParser (Avi Hayun via lewismc)
|
|
|
|
- Issue 34: Upgrade the Slf4j logging in SiteMaps (Avi Hayun via lewismc)
|
2014-06-24 04:49:01 +02:00
|
|
|
|
2015-09-15 16:57:13 +02:00
|
|
|
Release 0.4 (2014-04-11)
|
2014-03-19 20:15:05 +01:00
|
|
|
- Issue 13: Fix deprecation in Crawler Commons Code (lewismc via kkrugler)
|
|
|
|
- Issue 8 : Upgrade of httpclient to v4.2.6 (Fuad Efendi, lewismc via kkrugler)
|
|
|
|
- Issue 18: Support matching against query parameters in robots.txt rules (alparslanavci, kkrugler)
|
2014-03-16 22:53:28 +01:00
|
|
|
- Issue 21: Follow Google example of giving Allow directives higher match weight than Disallow directives (y.vladimirov, via kkrugler)
|
|
|
|
- Issue 22: Use longest-match-wins approach to matching URLs in robots.txt (kkrugler)
|
|
|
|
- Issue 17: Support Googlebot-compatible regular expressions in URL specifications (alparslanavci. kkrugler)
|
|
|
|
- Issue 31: Missing top level domains (jnioche, kkrugler)
|
|
|
|
- Issue 23: Trivial improvements to UserAgent (lewismc)
|
|
|
|
- Issue 30: SitemapIndex should allow to skip sitemaps (Sebastian Nagel, kkrugler)
|
2013-10-21 17:31:14 +02:00
|
|
|
- cleanup of ANT build remnants [lib and lib-ext] (jnioche)
|
|
|
|
|
2015-09-15 16:57:13 +02:00
|
|
|
Release 0.3 (2013-10-11)
|
2013-10-11 12:40:00 +02:00
|
|
|
- Upgraded to Tika 1.4 (jnioche)
|
2013-07-18 16:01:37 +02:00
|
|
|
- [SiteMap] added utility class for testing sitemaps (jnioche)
|
2013-07-01 21:18:25 +02:00
|
|
|
- Issue 16: remove ant scripts and configuration (lewismc)
|
2013-05-24 16:09:26 +02:00
|
|
|
- Issue 27: [SiteMap] Unnecessary String concatenations when logging + in SiteMapURL.toString() (jnioche)
|
2013-05-24 16:15:51 +02:00
|
|
|
- Issue 26: [SiteMap] Set correct default priority for URL in a sitemap file (jnioche)
|
2013-09-06 14:33:02 +02:00
|
|
|
- Issue 25: [Robots] Robots parser should not lowercase sitemap URLs (jnioche)
|
2013-10-02 15:40:50 +02:00
|
|
|
- Issue 29: [SiteMap] try urls when <loc> element is missing (jnioche)
|
2013-05-24 16:09:26 +02:00
|
|
|
|
2015-09-15 16:57:13 +02:00
|
|
|
Release 0.2 (2013-02-02)
|
2013-01-30 05:12:34 +01:00
|
|
|
- Move to pure Maven for CC build lifecycle (lewismc)
|
|
|
|
- Move Javadoc out of core code (lewismc)
|
2013-01-28 03:45:41 +01:00
|
|
|
- Substantiate Javadoc (lewismc)
|
2013-01-28 03:43:34 +01:00
|
|
|
- Review default.properties (lewismc)
|
2013-01-24 00:08:51 +01:00
|
|
|
- add HTTP status code & reason to FetchedResult (Fuad Efendi via kkrugler)
|
|
|
|
- support for multiple user agent names (Tejas Patil via kkrugler)
|
|
|
|
- added javadoc generation, publish in /doc/javadoc (kkrugler)
|
|
|
|
- switch to using eclipse-formatter.properties (kkrugler)
|
|
|
|
- support robots.txt files that have UTF-16LE and UTF-16BE BOMs (kkrugler)
|
|
|
|
- support for user agent names that contain spaces (kkrugler)
|
|
|
|
- fixed handling of BOM in sitemaps (Vivek Magotra via kkrugler)
|
2011-07-25 12:23:21 +02:00
|
|
|
- refactoring of SiteMap objects (Hannes Schwarz via jnioche)
|
|
|
|
- added simple support for the file: protocol (kkrugler)
|
|
|
|
- cleaned up packaging and added "install" target (kkrugler)
|
|
|
|
|
|
|
|
Release 0.1
|
|
|
|
- parsing robots.txt
|
|
|
|
- parsing sitemaps
|
|
|
|
- URL analyzer which returns Top Level Domains
|
|
|
|
- a simple HttpFetcher
|