Aecio Santos
f2bf9300e6
Upgrades httpclient to version 4.5.1 ( fixes #84 )
...
and do not ignore test failures during maven build
2015-10-09 14:08:39 -04:00
Julien Nioche
b7ccc8d1f1
Fixed test for domains #103
2015-10-07 10:21:28 +01:00
Julien Nioche
98316a51fc
issue #100 in CHANGES.txt
2015-10-06 18:48:58 +01:00
Julien Nioche
a7728c6733
Merge pull request #100 from DigitalPebble/tld_names
...
updated tld names from publicsuffix.org
2015-10-06 18:44:33 +01:00
Julien Nioche
9e93037e79
updated tld names from publicsuffix.org
2015-10-05 13:38:10 +01:00
Ken Krugler
9e9f5df884
Fixed up CHANGES.txt file
2015-09-15 07:57:13 -07:00
Julien Nioche
f0d71b4729
mentioned issue 89 in CHANGES
2015-09-15 11:38:40 +01:00
Lewis John McGibbney
f2e41af53c
Trivial commit to update CHANGES.txt for recent commits.
2015-09-14 22:40:04 -07:00
Ken Krugler
2c687d1bba
Roll in fix for issue #87 w/RSS 1.0 site maps
2015-09-11 15:16:12 -07:00
Ken Krugler
d08f396576
Tweaked Javadoc update from Michael Roeder
2015-09-11 11:49:20 -07:00
Ken Krugler
ba7f22c811
Merge branch 'MichaelRoeder-additionalJavadoc'
2015-09-11 11:23:07 -07:00
MichaelRoeder
e8f38fda03
Added a javadoc comment to the SimpleRobotRulesParser class explaining its behavior.
2015-09-10 13:03:10 +02:00
Ken Krugler
16e13bedc4
Improve Javadoc on robot name matching
...
And add a .gitignore
2015-09-08 16:12:31 -07:00
Lewis John McGibbney
24d24b3de1
Merge pull request #94 from lewismc/jdk1.7
...
Upgrade to Jdk1.7
2015-09-07 20:22:59 -04:00
Lewis John McGibbney
d7ed6a742c
Upgrade to Jdk1.7 - remove license header at pom.xml and improve logging implementations.
2015-09-07 15:20:00 -04:00
Lewis John McGibbney
c385883ec3
Merge branch 'master' into jdk1.7
2015-09-07 14:17:29 -04:00
Avi Hayun
478a7d7240
Merge pull request #82 from crawler-commons/validSitemaps
...
[Sitemaps] Upgrade Valid / Legal / Strict SitemapUrls
2015-09-07 21:11:58 +03:00
Lewis John McGibbney
ba5906ec40
Upgrade to JDK 1.7 compiler version and introduce Maven forbidden API's plugin
2015-09-06 13:55:26 -04:00
Lewis John McGibbney
827b073d12
Merge branch 'master' into validSitemaps
2015-08-26 09:50:26 -07:00
Julien Nioche
f155148216
Upgraded Tika 1.10 #89
2015-08-20 15:35:38 +01:00
matt-deboer
d0e1f1f124
Added docker-rest script (helpful on OSX) to re-create boot2ocker vm and update env. vars in ~/.profile
2015-07-23 14:41:21 -07:00
matt-deboer
d203f0d4ac
Reworked sitemap parser to use SAX for optional parsing of partial docs.
...
Traded Stack for LinkedList for performance improvement.
Fix to getParentElement();
Added test for case referenced by issues #79 and #75 .
2015-06-27 22:33:15 -07:00
Lewis John McGibbney
cd06d834a6
Update README.md
2015-06-15 12:40:53 -07:00
Julien Nioche
0f24082dc0
Applied formatting with mvn java-formatter:format
2015-06-11 10:47:19 +01:00
Julien Nioche
feb40af519
Applied formatting with mvn java-formatter:format
2015-06-11 10:45:06 +01:00
Julien Nioche
37c13c8465
Update README.md
...
added link to javadoc
2015-06-11 10:39:45 +01:00
Julien Nioche
b77fa0052a
Update README.md
...
Announcing 0.6 release
2015-06-11 10:37:52 +01:00
Chaiavi
5cf62ab7d5
Fix for Issue 60
...
SitemapUrls can be not valid when they are referenced in a sitemap which
it's
directory is on a completely different path than the referenced
SitemapUrl.
All as indicated here:
http://www.sitemaps.org/protocol.html#location
In order to clarify the validity aspect we need to upgrade the following
1. Add a little more explanations as javadocs and as logs
2. Rename "Legal" (I think only one occurrence) to "valid" (in the
parser)
3. Add to the Sitemap class a new method to get all *valid* SitemapUrls
4. When dropping a URL due to invalidity a log should be shown, a URL
shouldn't
be dropped quietly.
2015-06-08 23:41:56 +03:00
Julien Nioche
504c207488
Added Julien's public key to KEYS
2015-06-04 10:50:14 +01:00
Lewis John McGibbney
9d45376336
Add KEYS file to CC
2015-06-01 21:55:49 -07:00
Julien Nioche
22206f3a43
[maven-release-plugin] prepare for next development iteration
2015-05-27 16:38:05 +01:00
Julien Nioche
39d076a13b
[maven-release-plugin] prepare release crawler-commons-0.6
2015-05-27 16:38:01 +01:00
Julien Nioche
2394b6713a
Removed tagBase from maven-release-plugin configuration
2015-05-27 16:36:05 +01:00
Julien Nioche
ee4a936066
Revert "[maven-release-plugin] prepare release crawler-commons-0.6"
...
This reverts commit 3b09a9ba52
.
2015-05-27 16:16:54 +01:00
Julien Nioche
3b09a9ba52
[maven-release-plugin] prepare release crawler-commons-0.6
2015-05-27 16:05:02 +01:00
Julien Nioche
a41ab43c41
README Fixed URL for changes file release 0.5
...
was pointing to the 'live' file
2015-05-27 12:18:40 +01:00
Julien Nioche
e8ec75e019
Reverted failed release + changed groupId
2015-05-27 12:16:18 +01:00
Julien Nioche
d115f158b2
[maven-release-plugin] prepare for next development iteration
2015-05-26 10:58:35 +01:00
Julien Nioche
8328e554d4
[maven-release-plugin] prepare release crawler-commons-0.6
2015-05-26 10:58:31 +01:00
Julien Nioche
20861baf47
Issue 75: [Sitemaps] more robust parsing of XML elements (jnioche, kkrugler)
2015-05-22 11:08:21 +01:00
Julien Nioche
40731c3304
applied formatting with mvn java-formatter:format
2015-05-15 09:03:24 +01:00
Julien Nioche
8de545ccdc
Merge pull request #78 from lewismc/CC-77
...
simplify pom file #77
2015-05-15 08:55:45 +01:00
Lewis John McGibbney
e8065d5372
simplify pom file #77
2015-05-14 12:05:37 -07:00
Julien Nioche
47e30b5c22
Merge pull request #76 from crawler-commons/formatter
...
maven-java-formatter-plugin
2015-05-06 10:10:56 +01:00
Julien Nioche
63a837d5d7
updated CHANGES.txt for #76
2015-05-06 10:10:25 +01:00
Julien Nioche
d22a9d0617
removed properties file + 1.6 compliant formatting
2015-05-06 09:33:31 +01:00
Julien Nioche
0a5d9d338a
maven-java-formatter-plugin
2015-04-30 13:52:44 +01:00
Ken Krugler
e42c268e03
Add news about project moving to GitHub
2015-04-22 07:59:43 -07:00
Ken Krugler
ee23e1fb0d
Update CHANGES.txt
2015-04-22 07:31:20 -07:00
Ken Krugler
798dc59839
Update CHANGES.txt
2015-04-22 07:30:57 -07:00